C Utilities (cutils)
Low-level C++/CUDA utility functions used internally by all pyc kernels. These are not part of the public API but are documented here for completeness.
CPU/C++ Utilities (utils.h)
Function |
Description |
|---|---|
|
Extract column |
|
Stack a list of tensors as columns (each reshaped to Nx1) into an NxK tensor |
|
Pointer-vector overload of |
|
Create a |
|
No-op in CPU build; moves tensor to the appropriate device in CUDA build |
|
Move tensor to named device string (e.g. |
Low-level CUDA/C++ tensor utility functions.
(pyc)
Provides clip, format, tensor-option construction, and device transfer helpers used internally by the pyc modules.
Functions
-
torch::Tensor clip(torch::Tensor *inpt, int dim)
Clip tensor
inptalong dimensiondim.- Parameters:
inpt – Input tensor.
dim – Dimension to clip.
- Returns:
Clipped tensor.
-
torch::Tensor format(std::vector<torch::Tensor> *inpt)
Stack a vector of tensors into a single tensor.
- Parameters:
inpt – Vector of tensors.
- Returns:
Stacked tensor.
-
torch::Tensor format(std::vector<torch::Tensor*> inpt)
Stack a vector of tensor pointers into a single tensor.
- Parameters:
inpt – Vector of tensor pointers.
- Returns:
Stacked tensor.
-
torch::TensorOptions MakeOp(torch::Tensor *x)
Create
torch::TensorOptionsmatching the device and dtype ofx.- Parameters:
x – Reference tensor.
- Returns:
Matching
TensorOptions.
-
void changedev(torch::Tensor *inpt)
Move tensor
inptto CPU in place.- Parameters:
inpt – Tensor to move.
-
torch::Tensor changedev(std::string dev, torch::Tensor *inx)
Copy tensor
inxto the device specified bydev.- Parameters:
dev – Device string (e.g.
"cpu","cuda:0").inx – Source tensor.
- Returns:
Tensor on the target device.
CUDA Atomic Device Helpers (atomic.cuh)
These __device__ template functions are inlined into every CUDA kernel
that needs them. They provide numerically stable variants of common
operations.
Function |
Description |
|---|---|
|
Compute the (idy, idz) cofactor of a 3×3 matrix |
|
Safe reciprocal: returns |
|
Squared value: |
|
Round-trip clamp: rounds to 10 decimal places |
|
Sign-preserving square root: returns |
|
Computes |
|
|
CUDA Thread/Block Geometry (utils.cuh)
The blk_ inline computes the 2-D CUDA launch grid:
dim3 blk = blk_(num_elements, threads_per_block);
This is used in every CUDA kernel launch to ensure full coverage of the batch dimension.