jna/tp_jna.cpp · a0984dca66b845cd833401c96a0eaa24a9ef3ece · Elphel / tile_processor_gpu

CLAUDE: Stage 4 — native correlate2D/combine/normalize via JNA (quad correlation) · a0984dca

Andrey Filippov authored Jun 25, 2026

Extend run_convert_pipeline with do_corr: after convert_direct, allocate corr buffers
(gpu_corrs_td/combo_td/combo via alloc_image_gpu + corr_indices/combo_indices/num_corr_tiles),
launch correlate2D <<<1,1>>> (TD, CDP; 18 args, generates indices), read num_corr_tiles,
corr2D_combine (quad pairs_mask 0x0f), corr2D_normalize (TD->pixel), de-pitch gpu_corrs_combo.
tp_corr_selftest wrapper (do_corr=1).

Validated on RTX 5060 Ti via JNA: num_pairs=120, num_corr_combo=5120, output stats identical
to golden (max 0.6638, rms 0.0717). clt/aux_corr-quad.corr is OLDER (Apr-2025) than the CLT
golden (Jul-2025) so the active-tile ORDER differs -> pointwise compare is permutation-dominated
(0.66). Order-independent check (sort both, compare distributions): max value error 2.06e-05
== float32 precision => correlate2D/combine/normalize compute the correct values on Blackwell.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

a0984dca

tp_jna.cpp 28.6 KB

Replace tp_jna.cpp