- 25 Jun, 2026 9 commits
-
-
Andrey Filippov authored
CLAUDE: Step 2 corr — split TpProc into granular correlate2D_td/_inter_td/combine/normalize + get_corr2d Match GpuQuad's separate correlation calls: tp_proc_exec_corr2d_td (correlate2D TD, fat_zero=0, corr_radius=0 -> gpu_corrs_td, returns num_corr_tiles), tp_proc_exec_corr2d_inter_td (correlate2D_inter clt vs clt_ref, sel_sensors), tp_proc_exec_corr2d_combine (init|no_transpose<<1, num_pairs, pairs_mask -> num_corr_combo), tp_proc_exec_corr2d_normalize (combo -> corrs_combo / per-pair -> gpu_corrs), tp_proc_get_corr2d (per-pair de-pitch) + tp_proc_num_corr_tiles/combo. Add per-pair pixel buffer gpu_corrs (+dstride_corr). The bundled tp_proc_exec_corr2d/StageProc path is retained. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Add to the persistent TpProc API: tp_proc_setup_rbg_corr (imclt RBG + correlation buffers + corr config), tp_proc_exec_imclt / tp_proc_get_rbg, tp_proc_exec_corr2d (correlate2D TD -> corr2D_combine -> corr2D_normalize) / tp_proc_get_corr2d_combo. launch1() helper. tp_proc_convert_selftest extended to validate imclt vs aux_chnN.rbg and quad corr vs aux_corr-quad.corr (order-independent, stale golden). Validated on RTX 5060 Ti via the persistent API: CLT==golden (0.1085), RBG==golden (0.0201), quad-corr value-err 2.06e-5, no_kernels runs finite. The persistent granular API now covers the full set of GPU ops the LWIR16 CUAS workflow uses (geometry/convert_direct[+no_kernels/use_center_image/ erase_clt/ref_scene]/imclt/correlations) — the surface GpuQuadJna (integration step 2) delegates to. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Add TpProc: the production-facing persistent instance (buffers allocated once in tp_proc_setup, reused across set/exec/get, freed in tp_proc_destroy) — the surface GpuQuadJna will delegate to. API: tp_proc_create/setup, set_geometry/correction_vector/kernels/kernel_offsets/image/center_image/ tasks, exec_geometry (calc_reverse_distortions+rot_derivs+calculate_tiles_offsets), exec_convert_direct (ref_scene/erase_clt/no_kernels), get_clt, destroy. Includes the fragile convert_direct paths the migration must preserve: no_kernels (skip deconvolution -> kernels_hor/vert=0), use_center_image (broadcast one center image to all sensors), erase_clt (erase_clt_tiles), ref_scene (clt_ref buffer). tp_proc_convert_selftest validates end-to-end on RTX 5060 Ti: standard convert CLT == clt/aux_chnN.clt golden (max|CLT-golden|=0.1085, == Stage 2, num_active=5120); no_kernels path runs with finite output. update_image_gpu pitch is in BYTES (the "in floats" comment is misleading). Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
CLAUDE: Stage 5 — native textures_nonoverlap via JNA (executes on Blackwell; golden mismatch documented) Extend run_convert_pipeline with do_tex: setTextures/setRGBA-equivalent buffers (texture_indices from tasks, gpu_textures, diff_rgb_combo, color_weights, generate_RBGA_params), cuFuncSetAttribute(textures_accumulate, MAX_DYNAMIC_SHARED_SIZE_BYTES, shared_size) for the CDP child, launch textures_nonoverlap <<<1,1>>> (18 args), de-interleave diff_rgb_combo. tp_tex_selftest. RESULT on RTX 5060 Ti: textures_nonoverlap + its CDP child textures_accumulate EXECUTE correctly (no errors, shared 58880, 5120 tiles, output rms within ~1% of golden) => Blackwell compatibility confirmed. BUT diff_rgb_combo does NOT match the Jul-2025 golden numerically (value layers off by constant ~268, diff layers diff_sigma-sensitive). Ruled out input-CLT sensitivity (same error with golden CLT), diff_sigma (10.0 closest), arg/param order. NOT used by the LWIR16 CUAS workflow (cuas/ uses only convert_direct/corr2D_normalize/imclt_rbg_all). Documented known issue (golden staleness / unverified RGB-path drift), to track later via git bisect + the 107 kernel branch switch. See imagej-elphel-internal handoffs/2026-06-25_texture-diff-rgb-combo-mismatch.md. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Extend run_convert_pipeline with do_corr: after convert_direct, allocate corr buffers (gpu_corrs_td/combo_td/combo via alloc_image_gpu + corr_indices/combo_indices/num_corr_tiles), launch correlate2D <<<1,1>>> (TD, CDP; 18 args, generates indices), read num_corr_tiles, corr2D_combine (quad pairs_mask 0x0f), corr2D_normalize (TD->pixel), de-pitch gpu_corrs_combo. tp_corr_selftest wrapper (do_corr=1). Validated on RTX 5060 Ti via JNA: num_pairs=120, num_corr_combo=5120, output stats identical to golden (max 0.6638, rms 0.0717). clt/aux_corr-quad.corr is OLDER (Apr-2025) than the CLT golden (Jul-2025) so the active-tile ORDER differs -> pointwise compare is permutation-dominated (0.66). Order-independent check (sort both, compare distributions): max value error 2.06e-05 == float32 precision => correlate2D/combine/normalize compute the correct values on Blackwell. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Refactor the Stage-2 selftest into run_convert_pipeline(do_imclt) shared helper; tp_convert_direct_selftest is now a thin wrapper (do_imclt=0). Add tp_imclt_selftest (do_imclt=1): after convert_direct, allocate pitched RBG output buffers (alloc_image_gpu, 648x520/cam, mono), launch imclt_rbg_all <<<1,1>>> (gpu_clt -> gpu_corr_images), de-pitch via cudaMemcpy2D, compare to clt/aux_chnN.rbg golden. Validated on RTX 5060 Ti via Java->JNA: max|RBG-golden|=0.0201 over peaks 1535 -> relative ~1.31e-5. convert_direct CLT error unchanged (0.108505) => no regression. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Add tp_convert_direct_selftest to the JNA shim: mirrors TpHostGpu allTests' convert path (setImageKernels/setImgBuffers/setCltBuffers/setTasks + calc_reverse_distortions -> rot_derivs -> calculate_tiles_offsets [CDP] -> convert_direct), reusing the harness runtime-API host helpers (tp_utils/tp_files/TpParams/tp_paths) for ALL allocation and porting only the launches to driver-API cuLaunchKernel against the NVRTC module. Reads CLT back, compares to clt/aux_chnN.clt golden. build_lib.sh: nvcc + -std=c++17 (static constexpr TpParams members become inline), -Isrc + cuda-samples Common (helper_cuda.h), --pre-include algorithm. Validated on RTX 5060 Ti via Java->JNA: num_active_tiles=5120 (all), max|CLT-golden| =0.1085 over peaks of 12260 -> relative ~8.85e-6 (float32 NVRTC-vs-nvcc variation). First CDP (calculate_tiles_offsets) and 17-arg pointer-of-pointers convert_direct launch executing natively on Blackwell. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Add TpInstance to the JNA shim: device buffers (gpu_geometry_correction, gpu_rByRDist, gpu_rot_deriv, gpu_correction_vector) + setters (HtoD), the two pure-geometry launches (calcReverseDistortionTable {16,1,1}/{3,3,3}, calc_rot_deriv {num_cams,1,1}/{3,3,3}), and readback getters. Driver-API cuLaunchKernel against the NVRTC module (mirrors GpuQuad.execCalcReverseDistortions / execRotDerivs, no JCuda). build_lib.sh builds libtileproc.so. Validated via Java->JNA against tile_processor_gpu/clt reference data on the RTX 5060 Ti: rByRDist == clt/*.rbyrdist to ~1e-7 (aux 16-cam and main), rot_deriv rows orthogonal to ~1e-10 (scaled-rotation structure, det~zoom^3). Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
libtileproc shim (tp_jna.cpp: extern "C" tp_create_module/num_functions/last_error/destroy) + standalone tp_nvrtc_probe.cpp + build_probe.sh. NVRTC-compiles the kernels (+ JCUDA defines) -> cuLink(libcudadevrt, CDP) -> module -> 19 functions, validated on the RTX 5060 Ti (sm_120 via compute_90 PTX + driver JIT). Build artifacts gitignored. By the JCuda->JNA migration. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
- 03 Dec, 2025 2 commits
-
-
Andrey Filippov authored
-
Andrey Filippov authored
-
- 28 Sep, 2025 1 commit
-
-
Andrey Filippov authored
-
- 22 Jul, 2025 1 commit
-
-
Andrey Filippov authored
-
- 15 Apr, 2025 2 commits
-
-
Andrey Filippov authored
-
Andrey Filippov authored
-
- 13 Apr, 2025 1 commit
-
-
Andrey Filippov authored
-
- 12 Apr, 2025 3 commits
-
-
Andrey Filippov authored
-
Andrey Filippov authored
-
Andrey Filippov authored
-
- 10 Apr, 2025 2 commits
-
-
Andrey Filippov authored
-
Andrey Filippov authored
-
- 09 Apr, 2025 1 commit
-
-
Andrey Filippov authored
-
- 08 Apr, 2025 1 commit
-
-
Andrey Filippov authored
-
- 07 Apr, 2025 1 commit
-
-
Andrey Filippov authored
-
- 06 Apr, 2025 1 commit
-
-
Andrey Filippov authored
-
- 03 Apr, 2025 1 commit
-
-
Andrey Filippov authored
-
- 01 Apr, 2025 2 commits
-
-
Andrey Filippov authored
-
Andrey Filippov authored
-
- 31 Mar, 2025 1 commit
-
-
Andrey Filippov authored
-
- 26 Mar, 2025 1 commit
-
-
Andrey Filippov authored
-
- 18 Feb, 2025 1 commit
-
-
Andrey Filippov authored
-
- 13 Feb, 2025 1 commit
-
-
Andrey Filippov authored
-
- 08 Feb, 2024 1 commit
-
-
Andrey Filippov authored
-
- 27 Nov, 2022 1 commit
-
-
Andrey Filippov authored
-
- 21 Nov, 2022 2 commits
-
-
Andrey Filippov authored
-
Andrey Filippov authored
-
- 20 Nov, 2022 2 commits
-
-
Andrey Filippov authored
-
Andrey Filippov authored
-
- 19 Nov, 2022 2 commits
-
-
Andrey Filippov authored
-
Andrey Filippov authored
-