Custom Modules

All trainable components in numpygrad inherit from nn.Module. Subclassing it is the standard way to define new layers, loss functions, or complete architectures.

Minimal example

Override forward with the computation your module performs:

import numpygrad as npg
import numpygrad.nn as nn

class Affine(nn.Module):
    def __init__(self, in_features: int, out_features: int) -> None:
        super().__init__()
        self.weight = nn.Parameter(npg.random.randn((in_features, out_features)))
        self.bias   = nn.Parameter(npg.zeros((out_features,)))

    def forward(self, x: npg.array) -> npg.array:
        return x @ self.weight + self.bias

layer = Affine(4, 8)
out = layer(npg.random.randn((2, 4)))  # shape (2, 8)

Any attribute assigned as a Parameter is automatically included in module.parameters() and therefore in the optimizer’s update step.

Composing modules

Assign child modules as attributes and they are tracked recursively:

class TwoLayer(nn.Module):
    def __init__(self, dim: int) -> None:
        super().__init__()
        self.fc1 = Affine(dim, dim)
        self.fc2 = Affine(dim, dim)

    def forward(self, x: npg.array) -> npg.array:
        return self.fc2(npg.relu(self.fc1(x)))

net = TwoLayer(16)
print(len(list(net.parameters())))  # 4 — weight + bias for each layer

parameters() walks the full module tree recursively, so you can nest modules arbitrarily deep.

Using Sequential

For a simple chain of modules, nn.Sequential avoids boilerplate:

model = nn.Sequential(
    nn.Linear(4, 32),
    nn.ReLU(),
    nn.Linear(32, 2),
)

out = model(x)   # applies each module in order

Buffers

If you need a non-trainable array stored on the module (e.g. a running mean), assign it as a plain Array — it will not appear in parameters() but is still accessible as an attribute:

class BatchNorm1d(nn.Module):
    def __init__(self, num_features: int) -> None:
        super().__init__()
        self.scale  = nn.Parameter(npg.ones((num_features,)))
        self.shift  = nn.Parameter(npg.zeros((num_features,)))
        self.running_mean = npg.zeros((num_features,))  # not a Parameter

    def forward(self, x: npg.array) -> npg.array:
        mean = x.mean(axis=0)
        self.running_mean = 0.9 * self.running_mean + 0.1 * mean
        x_norm = (x - mean) / (x.var(axis=0) ** 0.5 + 1e-5)
        return self.scale * x_norm + self.shift

Inspecting parameters

state_dict() returns a flat dict mapping parameter names to their underlying NumPy arrays — useful for checkpointing:

sd = model.state_dict()
# {'fc1.weight': array(...), 'fc1.bias': array(...), ...}

import numpy as np
np.savez("checkpoint.npz", **sd)