Operators
All operators are available both as module-level functions (npg.relu(x))
and as Array methods (x.relu() where applicable). Every operator listed
here is differentiable — it records itself into the computation graph when any
input has requires_grad=True.
Element-wise
Function |
Description |
|---|---|
|
Element-wise addition (also |
|
Element-wise multiplication (also |
|
Element-wise power (also |
|
Element-wise \(e^x\) |
|
Natural logarithm (undefined for non-positive values) |
|
Absolute value |
|
|
|
Clamp values to |
|
Element-wise max of two arrays |
|
Element-wise min of two arrays |
Reductions
All reduction functions accept axis=None (reduce all axes) and
keepdims=False.
Function |
Description |
|---|---|
|
Sum of elements |
|
Mean of elements |
|
Product of elements |
|
Maximum value |
|
Minimum value |
|
Index of maximum value (no gradient) |
|
Variance. |
|
Standard deviation ( |
|
Cumulative sum along |
|
Cumulative product along |
Activations
Function |
Description |
|---|---|
|
Softmax along |
|
Log-softmax (numerically stable) |
|
\(\sigma(x) = 1 / (1 + e^{-x})\) |
|
Hyperbolic tangent |
|
\(\log(1 + e^x)\) (smooth approximation of ReLU) |
|
Gaussian Error Linear Unit (tanh approximation): \(0.5 x (1 + \tanh(\sqrt{2/\pi}(x + 0.044715 x^3)))\) |
|
|
Linear algebra
Function |
Description |
|---|---|
|
Matrix multiplication. Handles 1D (dot product), 2D, and batched 3D inputs. |
|
Dot product of two 1D or 2D arrays |
|
Frobenius / L2 norm |
|
Extract diagonal elements |
|
Sum of diagonal elements ( |
Shape transforms
Function |
Description |
|---|---|
|
Change shape without changing data. Returns a view when possible. |
|
Permute dimensions. |
|
Flatten to 1D (equivalent to |
|
Insert a size-1 dimension at |
|
Remove size-1 dimensions (all if |
|
Repeat elements along an axis |
|
Upper-triangular part of a matrix (zeros below diagonal |
|
Split array into chunks. |
|
Stack a list of arrays along a new axis |
|
Concatenate arrays along an existing axis |
Convolution
npg.conv2d(input, weight, bias=None, stride=1, padding=0)
2D convolution with full backward support.
input: shape(N, C_in, H, W)weight: shape(C_out, C_in, KH, KW)bias: shape(C_out,)orNonestrideandpaddingaccept an int or a(H, W)tupleOutput shape:
(N, C_out, H_out, W_out)
Example:
import numpygrad as npg
x = npg.random.randn((2, 3, 32, 32)) # batch of 2 RGB images
w = npg.random.randn((16, 3, 3, 3), requires_grad=True)
out = npg.conv2d(x, w, stride=1, padding=1) # (2, 16, 32, 32)
Special
npg.setitem(a, key, value)
Differentiable in-place assignment. Equivalent to a[key] = value but
records the operation in the computation graph, allowing gradients to flow
through the assignment:
a = npg.zeros((4,), requires_grad=True)
b = npg.setitem(a, 2, npg.array([5.0]))
b.sum().backward()
print(a.grad) # [1., 1., 1., 1.]
npg.masked_fill(a, mask, value)
Fill positions where boolean mask is True with scalar value.
Broadcasts mask over a when ranks differ (e.g. a 2D causal mask
applied to a 4D attention score tensor):
mask = npg.triu(npg.ones((T, T)), k=1).view(1, 1, T, T)
scores = scores.masked_fill(mask, float("-inf"))
Embedding lookup (npg.ops.embedding)
Row-wise lookup into a weight matrix; used internally by nn.Embedding:
weight = npg.random.randn((vocab_size, embed_dim), requires_grad=True)
indices = npg.array([0, 3, 1])
out = npg.ops.embedding(weight, indices) # (3, embed_dim)