Custom Modules
All trainable components in numpygrad inherit from nn.Module. Subclassing
it is the standard way to define new layers, loss functions, or complete
architectures.
Minimal example
Override forward with the computation your module performs:
import numpygrad as npg
import numpygrad.nn as nn
class Affine(nn.Module):
def __init__(self, in_features: int, out_features: int) -> None:
super().__init__()
self.weight = nn.Parameter(npg.random.randn((in_features, out_features)))
self.bias = nn.Parameter(npg.zeros((out_features,)))
def forward(self, x: npg.array) -> npg.array:
return x @ self.weight + self.bias
layer = Affine(4, 8)
out = layer(npg.random.randn((2, 4))) # shape (2, 8)
Any attribute assigned as a Parameter is automatically included in
module.parameters() and therefore in the optimizer’s update step.
Composing modules
Assign child modules as attributes and they are tracked recursively:
class TwoLayer(nn.Module):
def __init__(self, dim: int) -> None:
super().__init__()
self.fc1 = Affine(dim, dim)
self.fc2 = Affine(dim, dim)
def forward(self, x: npg.array) -> npg.array:
return self.fc2(npg.relu(self.fc1(x)))
net = TwoLayer(16)
print(len(list(net.parameters()))) # 4 — weight + bias for each layer
parameters() walks the full module tree recursively, so you can nest
modules arbitrarily deep.
Using Sequential
For a simple chain of modules, nn.Sequential avoids boilerplate:
model = nn.Sequential(
nn.Linear(4, 32),
nn.ReLU(),
nn.Linear(32, 2),
)
out = model(x) # applies each module in order
Buffers
If you need a non-trainable array stored on the module (e.g. a running mean),
assign it as a plain Array — it will not appear in parameters() but
is still accessible as an attribute:
class BatchNorm1d(nn.Module):
def __init__(self, num_features: int) -> None:
super().__init__()
self.scale = nn.Parameter(npg.ones((num_features,)))
self.shift = nn.Parameter(npg.zeros((num_features,)))
self.running_mean = npg.zeros((num_features,)) # not a Parameter
def forward(self, x: npg.array) -> npg.array:
mean = x.mean(axis=0)
self.running_mean = 0.9 * self.running_mean + 0.1 * mean
x_norm = (x - mean) / (x.var(axis=0) ** 0.5 + 1e-5)
return self.scale * x_norm + self.shift
Inspecting parameters
state_dict() returns a flat dict mapping parameter names to their
underlying NumPy arrays — useful for checkpointing:
sd = model.state_dict()
# {'fc1.weight': array(...), 'fc1.bias': array(...), ...}
import numpy as np
np.savez("checkpoint.npz", **sd)