C Utilities (cutils)

Low-level C++/CUDA utility functions used internally by all pyc kernels. These are not part of the public API but are documented here for completeness.

CPU/C++ Utilities (`utils.h`)

Function	Description
`clip(tensor, dim)`	Extract column `dim` from a 2-D tensor as a 1-D view
`format(vector<Tensor>*)`	Stack a list of tensors as columns (each reshaped to Nx1) into an NxK tensor
`format(vector<Tensor*>)`	Pointer-vector overload of `format`
`MakeOp(tensor*)`	Create a `TensorOptions` matching the device and dtype of the input tensor
`changedev(tensor*)`	No-op in CPU build; moves tensor to the appropriate device in CUDA build
`changedev(dev, tensor*)`	Move tensor to named device string (e.g. `"cuda:0"`); no-op in CPU build

Low-level CUDA/C++ tensor utility functions.

(pyc)

Provides clip, format, tensor-option construction, and device transfer helpers used internally by the pyc modules.

Functions

torch::Tensor clip(torch::Tensor *inpt, int dim)

Clip tensor inpt along dimension dim.

Parameters:

inpt – Input tensor.
dim – Dimension to clip.

Returns:

Clipped tensor.

torch::Tensor format(std::vector<torch::Tensor> *inpt)

Stack a vector of tensors into a single tensor.

Parameters:: inpt – Vector of tensors.
Returns:: Stacked tensor.

torch::Tensor format(std::vector<torch::Tensor*> inpt)

Stack a vector of tensor pointers into a single tensor.

Parameters:: inpt – Vector of tensor pointers.
Returns:: Stacked tensor.

torch::TensorOptions MakeOp(torch::Tensor *x)

Create torch::TensorOptions matching the device and dtype of x.

Parameters:: x – Reference tensor.
Returns:: Matching TensorOptions.

void changedev(torch::Tensor *inpt)

Move tensor inpt to CPU in place.

Parameters:: inpt – Tensor to move.

torch::Tensor changedev(std::string dev, torch::Tensor *inx)

Copy tensor inx to the device specified by dev.

Parameters:

dev – Device string (e.g. "cpu", "cuda:0").
inx – Source tensor.

Returns:

Tensor on the target device.

CUDA Atomic Device Helpers (`atomic.cuh`)

These __device__ template functions are inlined into every CUDA kernel that needs them. They provide numerically stable variants of common operations.

Function	Description
`_cofactor<T>(M, idy, idz)`	Compute the (idy, idz) cofactor of a 3×3 matrix `M`
`_div(p)`	Safe reciprocal: returns `1/p` or `0` if `p == 0`
`_p2(p)`	Squared value: `(p) × (p)`
`_clp(p)`	Round-trip clamp: rounds to 10 decimal places
`_sqrt(p)`	Sign-preserving square root: returns `-√\|p\|` for negative inputs
`_cmp(xx, yy, xy)`	Computes `xy / √(xx × yy)` (used for cosine-like ratios)
`_arccos(sm, pz)`	`acos(pz / √sm)` with safe division

CUDA Thread/Block Geometry (`utils.cuh`)

The blk_ inline computes the 2-D CUDA launch grid:

dim3 blk = blk_(num_elements, threads_per_block);

This is used in every CUDA kernel launch to ensure full coverage of the batch dimension.

C Utilities (cutils)

CPU/C++ Utilities (utils.h)

CUDA Atomic Device Helpers (atomic.cuh)

CUDA Thread/Block Geometry (utils.cuh)

CPU/C++ Utilities (`utils.h`)

CUDA Atomic Device Helpers (`atomic.cuh`)

CUDA Thread/Block Geometry (`utils.cuh`)