Operators

All operators are available both as module-level functions (npg.relu(x)) and as Array methods (x.relu() where applicable). Every operator listed here is differentiable — it records itself into the computation graph when any input has requires_grad=True.

Element-wise

Function	Description
`npg.add(a, b)`	Element-wise addition (also `a + b`)
`npg.mul(a, b)`	Element-wise multiplication (also `a * b`)
`npg.pow(a, exponent)`	Element-wise power (also `a ** exponent`)
`npg.exp(a)`	Element-wise \(e^x\)
`npg.log(a)`	Natural logarithm (undefined for non-positive values)
`npg.abs(a)`	Absolute value
`npg.relu(a)`	`max(0, x)` element-wise
`npg.clip(a, min, max)`	Clamp values to `[min, max]`
`npg.maximum(a, b)`	Element-wise max of two arrays
`npg.minimum(a, b)`	Element-wise min of two arrays

Reductions

All reduction functions accept axis=None (reduce all axes) and keepdims=False.

Function	Description
`npg.sum(a, axis, keepdims)`	Sum of elements
`npg.mean(a, axis, keepdims)`	Mean of elements
`npg.prod(a, axis, keepdims)`	Product of elements
`npg.max(a, axis, keepdims)`	Maximum value
`npg.min(a, axis, keepdims)`	Minimum value
`npg.argmax(a, axis, keepdims)`	Index of maximum value (no gradient)
`npg.var(a, axis, ddof, keepdims)`	Variance. `ddof=0` (population) or `ddof=1` (sample)
`npg.std(a, axis, ddof, keepdims)`	Standard deviation (`sqrt(var(...))`)
`npg.cumsum(a, axis)`	Cumulative sum along `axis`
`npg.cumprod(a, axis)`	Cumulative product along `axis`

Activations

Function	Description
`npg.softmax(a, axis=-1)`	Softmax along `axis`
`npg.log_softmax(a, axis=-1)`	Log-softmax (numerically stable)
`npg.sigmoid(a)`	\(\sigma(x) = 1 / (1 + e^{-x})\)
`npg.tanh(a)`	Hyperbolic tangent
`npg.softplus(a)`	\(\log(1 + e^x)\) (smooth approximation of ReLU)
`npg.gelu(a)`	Gaussian Error Linear Unit (tanh approximation): \(0.5 x (1 + \tanh(\sqrt{2/\pi}(x + 0.044715 x^3)))\)
`npg.relu(a)`	`max(0, x)` (also listed under element-wise)

Linear algebra

Function	Description
`npg.matmul(a, b)` / `npg.mm(a, b)`	Matrix multiplication. Handles 1D (dot product), 2D, and batched 3D inputs.
`npg.dot(a, b)`	Dot product of two 1D or 2D arrays
`npg.norm(a, axis, keepdims)`	Frobenius / L2 norm
`npg.diagonal(a, offset, axis1, axis2)`	Extract diagonal elements
`npg.trace(a, offset)`	Sum of diagonal elements (`diagonal(a, offset).sum()`)

Shape transforms

Function	Description
`npg.reshape(a, new_shape)`	Change shape without changing data. Returns a view when possible.
`npg.transpose(a, axes)`	Permute dimensions. `axes` is a tuple; `None` reverses all.
`npg.flatten(a)`	Flatten to 1D (equivalent to `reshape(a, (-1,))`)
`npg.unsqueeze(a, axis)`	Insert a size-1 dimension at `axis`
`npg.squeeze(a, axis=None)`	Remove size-1 dimensions (all if `axis=None`)
`npg.repeat(a, repeats, axis=None)`	Repeat elements along an axis
`npg.triu(a, k=0)`	Upper-triangular part of a matrix (zeros below diagonal `k`)
`npg.split(a, split_size_or_sections, dim=0)`	Split array into chunks. `split_size_or_sections` is an int (equal chunks; last may be smaller) or a list of sizes. Returns a tuple of `Array` objects.
`npg.stack(arrays, axis=0)`	Stack a list of arrays along a new axis
`npg.cat(arrays, axis=0)`	Concatenate arrays along an existing axis

Convolution

npg.conv2d(input, weight, bias=None, stride=1, padding=0)

2D convolution with full backward support.

input: shape (N, C_in, H, W)
weight: shape (C_out, C_in, KH, KW)
bias: shape (C_out,) or None
stride and padding accept an int or a (H, W) tuple
Output shape: (N, C_out, H_out, W_out)

Example:

import numpygrad as npg

x = npg.random.randn((2, 3, 32, 32))                # batch of 2 RGB images
w = npg.random.randn((16, 3, 3, 3), requires_grad=True)
out = npg.conv2d(x, w, stride=1, padding=1)  # (2, 16, 32, 32)

Special

npg.setitem(a, key, value)

Differentiable in-place assignment. Equivalent to a[key] = value but records the operation in the computation graph, allowing gradients to flow through the assignment:

a = npg.zeros((4,), requires_grad=True)
b = npg.setitem(a, 2, npg.array([5.0]))
b.sum().backward()
print(a.grad)   # [1., 1., 1., 1.]

npg.masked_fill(a, mask, value)

Fill positions where boolean mask is True with scalar value. Broadcasts mask over a when ranks differ (e.g. a 2D causal mask applied to a 4D attention score tensor):

mask = npg.triu(npg.ones((T, T)), k=1).view(1, 1, T, T)
scores = scores.masked_fill(mask, float("-inf"))

Embedding lookup (`npg.ops.embedding`)

Row-wise lookup into a weight matrix; used internally by nn.Embedding:

weight = npg.random.randn((vocab_size, embed_dim), requires_grad=True)
indices = npg.array([0, 3, 1])
out = npg.ops.embedding(weight, indices)   # (3, embed_dim)