MNIST

Source: examples/mnist/main.py

Overview

A small convolutional network trained on the MNIST handwritten-digit dataset. The example shows how to use nn.Conv2d and how to build a custom Module that mixes convolutions with a final linear classifier.

Running

python -m examples.mnist.main          # downloads data on first run
python -m examples.mnist.main --help   # see all options

Selected options:

--num-steps — training steps (default 500)
--batch-size — mini-batch size (default 32)
--hidden-dim — number of conv channels (default 32)
--step-size — AdamW learning rate (default 1e-3)

Code walkthrough

Dataset

MNIST images are downloaded automatically on first run and cached under examples/mnist/data/:

train_dataset = MNIST(split="train")   # 60 000 images, 28×28 greyscale
test_dataset  = MNIST(split="test")    # 10 000 images

Model

Two convolutional layers followed by a linear output head:

class MNISTClassifier(nn.Module):
    def __init__(self, input_shape, num_classes, hidden_dim):
        super().__init__()
        self.conv1 = nn.Conv2d(1, hidden_dim, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(hidden_dim, hidden_dim, kernel_size=3, stride=1, padding=1)
        self.linear_out = nn.Linear(hidden_dim * H * W, num_classes)

    def forward(self, x):
        x = npg.relu(self.conv1(x))    # (N, hidden, 28, 28)
        x = npg.relu(self.conv2(x))    # (N, hidden, 28, 28)
        x = x.reshape(x.shape[0], -1)  # (N, hidden*28*28)
        return self.linear_out(x)       # (N, 10)

Training loop

optimizer = npg.optim.AdamW(net.parameters(), lr=1e-3)
for step in range(num_steps):
    x, y = next(iter(dataloader))
    optimizer.zero_grad()
    loss = nn.cross_entropy_loss(net(x), y)
    loss.backward()
    optimizer.step()