PropCUDA¶
class PropCUDA(
equation,
shape,
source_type=[],
receiver_type=[],
abcn=50,
free_surface=False,
dh=10.0,
dt=0.002,
dev=None,
use_ckpt=False,
ckpt_chunks=100,
ckpt_mode="chunk",
ckpt_num=0,
pml_type="spml",
nt=-1,
B=1,
allow_growth=True,
full_mode="full",
boundary_saving_config=None,
)
Implementation:
src/sweep/propagator/cuda.py
Compiled CUDA propagator backed by equation-specific bindings from sweep._C.
In current Torch-side examples, this backend is often reached through
PropTorch(..., backend="cuda"). PropCUDA remains the lower-level CUDA
class when you want to work with CUDA-specific runtime details directly.
Note
PropCUDA is the backend with the most runtime-specific behavior:
anisotropic dh, reusable buffers, boundary saving, recursive checkpointing,
and RTM all live here.
Parameters¶
equation(equation instance): Equation instance whose compiled CUDA binding will be used.PropCUDAexpects the equation to expose_C(), and optionally_C_rtm()for RTM. It also expects acuda_layoutspec for CUDA runtime buffer allocation.shape(tuple[int, ...]): Physical model shape before absorbing boundaries are added. Use(nz, nx)in 2D and(nz, ny, nx)in 3D.source_type(list[str], optional): Wavefield names used for source injection. These names are resolved through the equation field metadata, so aliases may also be accepted. If omitted,PropCUDAusesequation.default_source_fields.receiver_type(list[str], optional): Wavefield names sampled at receiver locations. These are also resolved through the equation field metadata and default toequation.default_receiver_fieldswhen omitted.abcn(int, optional): Absorbing boundary width.free_surface(bool, optional): Whether the top boundary is treated as a free surface. This affects coordinate shifts before entering the CUDA kernels.dh(float or tuple[float, ...], optional): Grid spacing. Unlike the other propagators,PropCUDAsupports anisotropic tuple-valued spacing:(dz, dx)in 2D and(dz, dy, dx)in 3D.dt(float, optional): Time step in seconds.dev(device, optional): Execution device for tensors and reusable CUDA buffers.use_ckpt(bool, optional): Enables checkpoint-based memory reduction in the CUDA path.ckpt_chunks(int, optional): Checkpoint interval used in chunk checkpointing.ckpt_mode(str, optional): Checkpoint strategy. Supported values here are"chunk"and"recursive".ckpt_num(int, optional): Number of persistent checkpoints used by recursive checkpointing.pml_type(str, optional): PML implementation passed into equation setup.nt(int, optional): Stored time-step count. The actual working value is normally inferred from the runtime wavelet.B(int, optional): Initial batch capacity for reusable runtime buffers.allow_growth(bool, optional): IfTrue, runtime buffers may grow when a larger batch is seen later. IfFalse, larger batches than the preallocated capacity raise an error.full_mode(str, optional): Stored on the base class and not currently the main runtime switch for this backend.boundary_saving_config(dict, optional): Configuration for saving forward boundary values instead of storing all wavefields. Normalized form:{"enabled": False, "storage": "gpu", "transfer_interval": 1, "pinned_memory": False}.
Supported keys are:
enabled(bool): Whether boundary saving is enabled.storage(str): Where saved boundary values live. Supported values are"gpu"and"cpu".transfer_interval(int): How often boundary values are transferred when CPU storage is used. This must be at least1. Whenstorage="gpu", the effective interval is forced to1.pinned_memory(bool): Whether to use pinned host memory whenstorage="cpu". Whenstorage="gpu", this is effectively disabled.
Equation Requirements¶
For the CUDA path, an equation should expose:
_C(): compiled forward/backward CUDA entry points_C_rtm()when RTM support existscuda_layout: aCUDALayoutSpecinstance describing CUDA buffer layout
For field discovery, equations may also expose:
available_fields()describe_field(name)default_source_fieldsdefault_receiver_fields
Forward Parameters¶
forward(
wavelet,
sources,
receivers,
models=None,
source_encoding=False,
adj=False,
return_wavefield=False,
use_boundary_saving=None,
boundary_saving_config=None,
**kwargs,
)
wavelet(array-like): Source time function. Accepted layouts are(nt,),(B, nt),(B, nsrc, nt), and the source-encoding super-shot layout(1, nsrc, nt).sources(array-like): Source coordinates. Accepted layouts are(B, dim)and(B, nsrc, dim), including(1, nsrc, dim)for a source-encoding super-shot.receivers(array-like): Receiver coordinates. This path expects batched receiver coordinates as well, typically(B, nreceivers, dim)and(1, nreceivers, dim)for a source-encoding super-shot.models(list[torch.Tensor], optional): List of model tensors in the exact order required byequation.models. They are padded and expanded across the active batch before being passed into the binding.source_encoding(bool, optional): IfTrue, runs with a single encoded batch instead of one batch element per shot.PropCUDAalso auto-detects source encoding when the runtime inputs use(1, nsrc, nt),(1, nsrc, dim), and(1, nreceivers, dim).adj(bool, optional): Adjoint-style forward switch.return_wavefield(bool, optional): Present in the signature, but the current main CUDA forward path still returns only the synthetic data.use_boundary_saving(bool, optional): Runtime override for enabling boundary saving.boundary_saving_config(dict, optional): Runtime override for the boundary-saving policy.
In the shape descriptions above:
Bis the runtime batch sizensrcis the number of sources inside one batch elementdimis2in 2D and3in 3D
RTM Parameters¶
adjoint_source(array-like): Input data for reverse-time migration. Accepted layouts are(B, nt, nrec[, 1])and(B, nrec, nt).
Return Value¶
forward(...): synthetic datarecordrtm(...):(syn, image, source_illumination, receiver_illumination)