Operators

All operators are available both as module-level functions (npg.relu(x)) and as Array methods (x.relu() where applicable). Every operator listed here is differentiable — it records itself into the computation graph when any input has requires_grad=True.

Element-wise

Function

Description

npg.add(a, b)

Element-wise addition (also a + b)

npg.mul(a, b)

Element-wise multiplication (also a * b)

npg.pow(a, exponent)

Element-wise power (also a ** exponent)

npg.exp(a)

Element-wise \(e^x\)

npg.log(a)

Natural logarithm (undefined for non-positive values)

npg.abs(a)

Absolute value

npg.relu(a)

max(0, x) element-wise

npg.clip(a, min, max)

Clamp values to [min, max]

npg.maximum(a, b)

Element-wise max of two arrays

npg.minimum(a, b)

Element-wise min of two arrays

Reductions

All reduction functions accept axis=None (reduce all axes) and keepdims=False.

Function

Description

npg.sum(a, axis, keepdims)

Sum of elements

npg.mean(a, axis, keepdims)

Mean of elements

npg.prod(a, axis, keepdims)

Product of elements

npg.max(a, axis, keepdims)

Maximum value

npg.min(a, axis, keepdims)

Minimum value

npg.argmax(a, axis, keepdims)

Index of maximum value (no gradient)

npg.var(a, axis, ddof, keepdims)

Variance. ddof=0 (population) or ddof=1 (sample)

npg.std(a, axis, ddof, keepdims)

Standard deviation (sqrt(var(...)))

npg.cumsum(a, axis)

Cumulative sum along axis

npg.cumprod(a, axis)

Cumulative product along axis

Activations

Function

Description

npg.softmax(a, axis=-1)

Softmax along axis

npg.log_softmax(a, axis=-1)

Log-softmax (numerically stable)

npg.sigmoid(a)

\(\sigma(x) = 1 / (1 + e^{-x})\)

npg.tanh(a)

Hyperbolic tangent

npg.softplus(a)

\(\log(1 + e^x)\) (smooth approximation of ReLU)

npg.gelu(a)

Gaussian Error Linear Unit (tanh approximation): \(0.5 x (1 + \tanh(\sqrt{2/\pi}(x + 0.044715 x^3)))\)

npg.relu(a)

max(0, x) (also listed under element-wise)

Linear algebra

Function

Description

npg.matmul(a, b) / npg.mm(a, b)

Matrix multiplication. Handles 1D (dot product), 2D, and batched 3D inputs.

npg.dot(a, b)

Dot product of two 1D or 2D arrays

npg.norm(a, axis, keepdims)

Frobenius / L2 norm

npg.diagonal(a, offset, axis1, axis2)

Extract diagonal elements

npg.trace(a, offset)

Sum of diagonal elements (diagonal(a, offset).sum())

Shape transforms

Function

Description

npg.reshape(a, new_shape)

Change shape without changing data. Returns a view when possible.

npg.transpose(a, axes)

Permute dimensions. axes is a tuple; None reverses all.

npg.flatten(a)

Flatten to 1D (equivalent to reshape(a, (-1,)))

npg.unsqueeze(a, axis)

Insert a size-1 dimension at axis

npg.squeeze(a, axis=None)

Remove size-1 dimensions (all if axis=None)

npg.repeat(a, repeats, axis=None)

Repeat elements along an axis

npg.triu(a, k=0)

Upper-triangular part of a matrix (zeros below diagonal k)

npg.split(a, split_size_or_sections, dim=0)

Split array into chunks. split_size_or_sections is an int (equal chunks; last may be smaller) or a list of sizes. Returns a tuple of Array objects.

npg.stack(arrays, axis=0)

Stack a list of arrays along a new axis

npg.cat(arrays, axis=0)

Concatenate arrays along an existing axis

Convolution

npg.conv2d(input, weight, bias=None, stride=1, padding=0)

2D convolution with full backward support.

  • input: shape (N, C_in, H, W)

  • weight: shape (C_out, C_in, KH, KW)

  • bias: shape (C_out,) or None

  • stride and padding accept an int or a (H, W) tuple

  • Output shape: (N, C_out, H_out, W_out)

Example:

import numpygrad as npg

x = npg.random.randn((2, 3, 32, 32))                # batch of 2 RGB images
w = npg.random.randn((16, 3, 3, 3), requires_grad=True)
out = npg.conv2d(x, w, stride=1, padding=1)  # (2, 16, 32, 32)

Special

npg.setitem(a, key, value)

Differentiable in-place assignment. Equivalent to a[key] = value but records the operation in the computation graph, allowing gradients to flow through the assignment:

a = npg.zeros((4,), requires_grad=True)
b = npg.setitem(a, 2, npg.array([5.0]))
b.sum().backward()
print(a.grad)   # [1., 1., 1., 1.]

npg.masked_fill(a, mask, value)

Fill positions where boolean mask is True with scalar value. Broadcasts mask over a when ranks differ (e.g. a 2D causal mask applied to a 4D attention score tensor):

mask = npg.triu(npg.ones((T, T)), k=1).view(1, 1, T, T)
scores = scores.masked_fill(mask, float("-inf"))

Embedding lookup (npg.ops.embedding)

Row-wise lookup into a weight matrix; used internally by nn.Embedding:

weight = npg.random.randn((vocab_size, embed_dim), requires_grad=True)
indices = npg.array([0, 3, 1])
out = npg.ops.embedding(weight, indices)   # (3, embed_dim)