MNIST
=====

Source: ``examples/mnist/main.py``

Overview
--------

A small convolutional network trained on the MNIST handwritten-digit
dataset. The example shows how to use ``nn.Conv2d`` and how to build a
custom ``Module`` that mixes convolutions with a final linear classifier.

Running
-------

::

    python -m examples.mnist.main          # downloads data on first run
    python -m examples.mnist.main --help   # see all options

Selected options:

- ``--num-steps`` — training steps (default 500)
- ``--batch-size`` — mini-batch size (default 32)
- ``--hidden-dim`` — number of conv channels (default 32)
- ``--step-size`` — AdamW learning rate (default 1e-3)

Code walkthrough
----------------

**Dataset**

MNIST images are downloaded automatically on first run and cached under
``examples/mnist/data/``::

    train_dataset = MNIST(split="train")   # 60 000 images, 28×28 greyscale
    test_dataset  = MNIST(split="test")    # 10 000 images

**Model**

Two convolutional layers followed by a linear output head::

    class MNISTClassifier(nn.Module):
        def __init__(self, input_shape, num_classes, hidden_dim):
            super().__init__()
            self.conv1 = nn.Conv2d(1, hidden_dim, kernel_size=3, stride=1, padding=1)
            self.conv2 = nn.Conv2d(hidden_dim, hidden_dim, kernel_size=3, stride=1, padding=1)
            self.linear_out = nn.Linear(hidden_dim * H * W, num_classes)

        def forward(self, x):
            x = npg.relu(self.conv1(x))    # (N, hidden, 28, 28)
            x = npg.relu(self.conv2(x))    # (N, hidden, 28, 28)
            x = x.reshape(x.shape[0], -1)  # (N, hidden*28*28)
            return self.linear_out(x)       # (N, 10)

**Training loop**

::

    optimizer = npg.optim.AdamW(net.parameters(), lr=1e-3)
    for step in range(num_steps):
        x, y = next(iter(dataloader))
        optimizer.zero_grad()
        loss = nn.cross_entropy_loss(net(x), y)
        loss.backward()
        optimizer.step()