CLAUDE: Stage 4 — native correlate2D/combine/normalize via JNA (quad correlation)
Extend run_convert_pipeline with do_corr: after convert_direct, allocate corr buffers
(gpu_corrs_td/combo_td/combo via alloc_image_gpu + corr_indices/combo_indices/num_corr_tiles),
launch correlate2D <<<1,1>>> (TD, CDP; 18 args, generates indices), read num_corr_tiles,
corr2D_combine (quad pairs_mask 0x0f), corr2D_normalize (TD->pixel), de-pitch gpu_corrs_combo.
tp_corr_selftest wrapper (do_corr=1).
Validated on RTX 5060 Ti via JNA: num_pairs=120, num_corr_combo=5120, output stats identical
to golden (max 0.6638, rms 0.0717). clt/aux_corr-quad.corr is OLDER (Apr-2025) than the CLT
golden (Jul-2025) so the active-tile ORDER differs -> pointwise compare is permutation-dominated
(0.66). Order-independent check (sort both, compare distributions): max value error 2.06e-05
== float32 precision => correlate2D/combine/normalize compute the correct values on Blackwell.
Co-Authored-By:
Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Showing
Please register or sign in to comment