• Andrey Filippov's avatar
    CLAUDE: Stage 2 — native convert_direct selftest (first real execution + CDP on Blackwell) · 05ee47d0
    Andrey Filippov authored
    Add tp_convert_direct_selftest to the JNA shim: mirrors TpHostGpu allTests' convert
    path (setImageKernels/setImgBuffers/setCltBuffers/setTasks + calc_reverse_distortions
    -> rot_derivs -> calculate_tiles_offsets [CDP] -> convert_direct), reusing the harness
    runtime-API host helpers (tp_utils/tp_files/TpParams/tp_paths) for ALL allocation and
    porting only the launches to driver-API cuLaunchKernel against the NVRTC module. Reads
    CLT back, compares to clt/aux_chnN.clt golden.
    
    build_lib.sh: nvcc + -std=c++17 (static constexpr TpParams members become inline),
    -Isrc + cuda-samples Common (helper_cuda.h), --pre-include algorithm.
    
    Validated on RTX 5060 Ti via Java->JNA: num_active_tiles=5120 (all), max|CLT-golden|
    =0.1085 over peaks of 12260 -> relative ~8.85e-6 (float32 NVRTC-vs-nvcc variation).
    First CDP (calculate_tiles_offsets) and 17-arg pointer-of-pointers convert_direct
    launch executing natively on Blackwell.
    Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
    05ee47d0
Name
Last commit
Last update
..
.gitignore Loading commit data...
build_lib.sh Loading commit data...
build_probe.sh Loading commit data...
tp_jna.cpp Loading commit data...
tp_nvrtc_probe.cpp Loading commit data...