CLAUDE: Stage 2 — native convert_direct selftest (first real execution + CDP on Blackwell)
Add tp_convert_direct_selftest to the JNA shim: mirrors TpHostGpu allTests' convert
path (setImageKernels/setImgBuffers/setCltBuffers/setTasks + calc_reverse_distortions
-> rot_derivs -> calculate_tiles_offsets [CDP] -> convert_direct), reusing the harness
runtime-API host helpers (tp_utils/tp_files/TpParams/tp_paths) for ALL allocation and
porting only the launches to driver-API cuLaunchKernel against the NVRTC module. Reads
CLT back, compares to clt/aux_chnN.clt golden.
build_lib.sh: nvcc + -std=c++17 (static constexpr TpParams members become inline),
-Isrc + cuda-samples Common (helper_cuda.h), --pre-include algorithm.
Validated on RTX 5060 Ti via Java->JNA: num_active_tiles=5120 (all), max|CLT-golden|
=0.1085 over peaks of 12260 -> relative ~8.85e-6 (float32 NVRTC-vs-nvcc variation).
First CDP (calculate_tiles_offsets) and 17-arg pointer-of-pointers convert_direct
launch executing natively on Blackwell.
Co-Authored-By:
Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Showing
Please register or sign in to comment