Commits · a65cd04ac7f4ee1508119db087d339c8b75008be · Elphel / imagej-elphel

04 Jul, 2026 10 commits

CLAUDE: A2 step 1 - raw-jp4 + CuasConditioning ingest feeding the GPU directly (curt_pose_raw) · a65cd04a

Andrey Filippov authored Jul 04, 2026

Per Andrey: bypass the QuadCLT image_data mechanics entirely and feed the GPU
directly - the permanent RT shape (more data becomes GPU-memory-resident;
QuadCLT stays for geometry/poses only).

- CuasMotion.readRawImageData(): raw /jp4/ reader extracted from
  perSensorFromRawJp4 (oracle getJp4Tiff, one thread/sensor, instrumentation);
  perSensorFromRawJp4 rewired, behavior unchanged.
- CuasConditioning.conditionSceneToGpu(): raw read -> condition() with the
  CURRENT calibration (curt_calib-updated lwir scales/offsets/scales2 +
  per-pixel FPN from the scene QuadCLT) -> bind scene (saveQuadClt,
  conditional) -> clear hasNewImageData -> setBayerImages(data,true) force-H2D.
  The bayer guard (ebef0b23 fix) keeps the upload alive through
  interCorrPair's own setBayerImages(false). TELL if the guard ever breaks:
  results become identical to the prepared-data path.
- CuasPoseRT: curt_pose_raw flag (new checkbox 'Pose test raw-jp4 ingest (A2)')
  runs the ingest per scene before the fit; ingest failure coasts the
  prediction and records an empty CSV/hyper row (fail=-1).

Acceptance (recorded): A2 legitimately diverges from phase A (old
Photogrammetric Calibration was broken, frozen from unrelated footage) -
judge by A2's OWN dstored/corr-RMS/convergence; systematically worse = red flag.

mvn compile clean.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

a65cd04a

CLAUDE: CuasTD - NaN-aware cross-sensor TD consolidation (A2 step 2) + JNA getCltData override · 6ecc13ad

Andrey Filippov authored Jul 04, 2026

Phase A2/B building block: consolidate the 16 per-sensor CLT channels into ONE
averaged TD channel (average images BEFORE correlation - multiply averages, not
average products). Per-tile granularity: sum sensors that have the tile (first
element NaN = absent), count, divide; count plane returned as the weight; a
stray in-tile NaN poisons the whole result tile (fail-visible). Not available
on GPU (combine_inter only sums correlation PRODUCTS) - this CPU implementation
+ get/setCltData D2H/H2D is the A2 bridge and the bit oracle for the future
clt_average_sensors kernel.

- CuasTD.validateConsolidation(): linearity oracle - imclt(TD-avg) must equal
  pixel-average of per-sensor imclt renders (same GPU imclt both sides);
  prints count-plane stats + max|diff|/RMS, saves -CUAS-TDAVG-CHECK 3-slice
  stack, restores original TD. Wired into the curt_cond_test branch after
  perSensorFromRawJp4 (uses its raw-jp4 16-sensor TD).
- GpuQuadJna.getCltData() override added (base derefs null gpu_clt_h on JNA
  shells - the known un-overridden-accessor class); uses tp_proc_get_clt.

mvn compile clean.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

6ecc13ad

CLAUDE: CuasPoseRT: cluster-based tile selection (neighbor-aware, edge-first for roll) · e6d2be34

Andrey Filippov authored Jul 04, 2026

Findings from the FILT150 run: (1) scattered rank-150 selection starved the
per-scene neighbor consolidation (min_str_neib/eig_str_neib) - only ~57 tiles
with accidental neighbors were measured per scene (neighbors-vs-measured
corr 0.78); (2) roll degraded (RMS 0.106 vs 0.059 mrad, bias +0.073) - the
selection carried only 11% of the full set's roll information.

deriveSelection() stage 2 now picks disjoint 3x3 CLUSTERS of gate-passing
tiles (>=CLUSTER_MIN_ELIGIBLE=6 of 9), round-robin from three pools:
LEFTMOST, RIGHTMOST (per Andrey - edge tiles have the most roll influence),
BEST-QUALITY (median member fmax), until the tile budget is filled; scattered
best tiles fill any remainder. Offline simulation on the real calibration:
24 clusters (8/8/8), 150 tiles, mean 3.97 in-selection neighbors, roll info
+48% vs scattered rank-150. Measurement code untouched (oracle identical).

mvn compile clean.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

e6d2be34

CLAUDE: CuasPoseRT: automatic MAXDXY reuse (FPN-style) + NaN (not +inf) for viewable calibration · 1ff4add6

Andrey Filippov authored Jul 04, 2026

Per Andrey: (1) calibration reuse is now automatic - if -POSE-RT-MAXDXY exists
it is used (filtered run), else a full run generates it; new
curt_pose_recalc flag forces regeneration (replaces the backwards
curt_pose_use_filt enable). Matches the FPN reuse pattern. (2) MAXDXY stores
NaN instead of +inf for NaN-in-any-scene tiles - deriveSelection rejects NaN
and +inf identically (non-finite), and NaN keeps the TIFF viewable in ImageJ
(+inf broke min/max autoscaling).

mvn compile clean (Eyesis closed).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

1ff4add6

CLAUDE: CuasPoseRT: -POSE-RT-MAXDXY calibration artifact, NMAD outlier gate + rank-N selection · abca0375

Andrey Filippov authored Jul 04, 2026

Replaces the absolute curt_pose_max_dxy with the scale-free scheme (Andrey's
histogram rule formalized + robustness for worse footage):
- Calibration artifact -POSE-RT-MAXDXY.tiff: per-tile max-over-scenes residual,
  +inf where any scene NaN (auto-reject, mergeable across runs by max), NaN
  where unmeasured. Saved only from FULL-selection runs (a filtered run never
  shrinks coverage). Continuous statistic persisted, boolean selection derived
  at load - policy can change without re-measuring.
- deriveSelection(): stage 1 outlier gate keep max <= median + k*NMAD of finite
  per-tile maxes (curt_pose_dxy_k=0.75; on the reference footage: MBEN gate
  0.477 keeps 595, degraded NOMB self-adapts to 0.728 keeps 626 - same ~65%);
  stage 2 rank-N budget keep curt_pose_num_tiles=150 best (threshold-free).
- curt_pose_use_filt now loads MAXDXY and derives; missing -> full run
  generates it (FPN-style reuse pattern).
Importance-greedy (3x3 information matrix) ranking = next step.

mvn compile clean.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

abca0375

CLAUDE: CuasPoseRT: outlier post-filter (-POSE-RT-RELIABLE-FILT) + two-pass selection · 3f96868f

Andrey Filippov authored Jul 04, 2026

Per Andrey: a selected tile is BAD if its measured dxy is NaN in any scene or
exceeds curt_pose_max_dxy (absolute, default 0.25 pix) in at least one scene.
Survivors saved as -POSE-RT-RELIABLE-FILT.tiff; curt_pose_use_filt loads it on
a next run and ANDs with the strength selection (two-pass workflow: full run
calibrates the selection, subsequent runs use ~191 clean tiles instead of 1074
on the reference footage; kept-tile mean dxy 0.087 pix).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

3f96868f

CLAUDE: CuasPoseRT: wire motion-blur compensation (imp.mb_en gated) · 0063d548

Andrey Filippov authored Jul 04, 2026

Exercises the existing MB machinery in the RT iterator: when imp.mb_en is ON,
per-scene blur vectors from OpticalFlow.getMotionBlur (FD-based rates) send
interCorrPair down the setInterTasksMotionBlur/interCorrTDMotionBlur path -
convert_direct runs twice, the second run subtracting the shifted+scaled copy
via negative TpTask.scale (LWIR bolometer exponential-tail removal). mb_en OFF
keeps the single-run path, giving a one-checkbox A/B. Same getMotionBlur usage
as offline setInitialOrientationsCuas (stored truth was produced WITH MB on).

mvn compile clean (Eyesis closed).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

0063d548

CLAUDE: CuasPoseRT: pixel-unit RMSE + per-scene log identification · 935e1dcd

Andrey Filippov authored Jul 04, 2026

Per Andrey: (1) fitted-vs-stored deltas reported in PIXELS (the informative
unit), using the same scales as the LMA par_scales - az/tilt = focal/pixelSize,
roll = distortionRadius/pixelSize; mrad kept secondary. (2) A 'CuasPoseRT scene
i (of N) <timestamp> Done/FAILED' line after each fit so the unlabeled LMA
iteration prints above it are attributable to a scene (SYSTEM_OUT-01.log had
iterations but no index/timestamp). Per-scene line also shows dstored in pix.

Verified with standalone javac (Eyesis live - no mvn).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

935e1dcd

CLAUDE: CuasPoseRT: save -POSE-RT-RELIABLE mask + -POSE-RT-HYPER measurement stack · d23011b4

Andrey Filippov authored Jul 04, 2026

Per Andrey: characterize the per-scene measurement, incl. the eigenvector data.
-POSE-RT-HYPER (80x64, z=scenes, t=components, make_hyper layout): dx, dy,
strength, dxy=|dx,dy| from vector_XYS; sqrt_l0, sqrt_l1 (peak-ellipse half-axes,
pix), elong=sqrt(l1/l0) (linear-feature indicator), eig0_ang (precise-axis
direction, [0,PI)) from coord_motion eigen {eig_x,eig_y,l0,l1} - NaN unless
imp.eig_use. Data = last LMA cycle's coord_motion via the existing
coord_motion_rslt out-param. -POSE-RT-RELIABLE = tile selection mask.

Verified with standalone javac against target/classes (Eyesis live - no mvn;
Eclipse rebuilds on restart).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

d23011b4

CLAUDE: CuasPoseRT phase A - RT pose-adjustment prototype (curt_pose_test) · 472435db

Andrey Filippov authored Jul 04, 2026

Top-level scene iterator re-generating per-scene 3-angle poses against the
persistent virtual-center reference, RT-style: ascending time order, zero-order
prediction seeding (fit anchored to the center, prediction only warm-starts the
LMA), single pass on the final combo DSI (no refinement pass - it only existed
because disparity arrived after initial orientations offline). Measurement
engine = proven Interscene.adjustPairsLMAInterscene (reference GPU data set
once); phase B will swap it for the lean TD-average x virtual-center path with
GPU argmax+eigen kernels, keeping this iterator + CSV as the oracle.

- new cuas/rt/CuasPoseRT.testPoseSequence(): reference prep (strength>
  curt_pose_str tile selection, setReferenceGPU with center CLT), stored-pose
  seed/truth from center ErsCorrection scenes_poses, per-scene fit with 3-angle
  param_select (XYZ locked), ERS dt from pose finite differences (disable_ers),
  MB off, coast-on-failure; writes -POSE-RT-TEST.csv + fitted-vs-stored summary
- params curt_pose_test (bool) + curt_pose_str (1.0) - 6 plumbing sites
- OpticalFlow curt_en branch: curt_pose_test runs INSTEAD of detection

Build: mvn compile clean. Runtime validation pending (Eclipse/Eyesis run on
sequence 1773135476_186641, truth = re-adjusted stored poses).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

472435db

03 Jul, 2026 5 commits

CLAUDE: self-documenting comments: TpTask bridge role, differential... · 5e14bb72

Andrey Filippov authored Jul 03, 2026

CLAUDE: self-documenting comments: TpTask bridge role, differential rectification, offset composition

Comment-only (no code change; mvn compile clean). Documents, from Andrey's
explanation: TpTask as the Java<->CUDA work-list bridge; the per-sensor xy
offset as the differential-rectification composed shift (factory kernel
offset + misalignment + disparity + relative pose) split integer/fractional;
historic host-side vs current GPU-side geometry fill; updateTasks() D2H;
disp_dist[cam][4] = d(x,y)/d(disp,ndisp) Jacobian consumed by Corr2dLMA and
lazy-eye/ERS.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

5e14bb72

CLAUDE: curt_calib - photometric calibration as first step of the RT flow · 0f20de7a

Andrey Filippov authored Jul 03, 2026

New curt_calib parameter (CUAS RT dialog, saved/restored) runs/bypasses the
per-sensor photometric (re)calibration as the first step of the CUAS RT
processing flow, before detection (no longer tied to the diagnostic).
Extracted CuasMotion.rtPhotometricCalibration() = convertFromData() (upload
+ own uniform grid convert, split out of perSensorFromData) + fit + apply/
save. Production step converts and calibrates without saving stacks; the
curt_cond_test diagnostic (replaces detection) keeps the raw-vs-conditioned
stack compare and makes the calibration step save -CUAS-PERSENSOR[-ADJ].
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

0f20de7a

CLAUDE: fix lwir recalibration persistence targets · 0594db87

Andrey Filippov authored Jul 03, 2026

The virtual -CENTER INTERFRAME corr-xml only carries poses/velocities, so
saving the recalculated 16+16 lwir offsets/scales there was futile. Follow
the established photometric machinery (runPhotometric()/photoEach()) and
the top-menu save/restore convention instead: set the new values on
master_CLT (immediate use), quadCLTs[ref_index] (physical photometric
owner, its <scene>-INTERFRAME.corr-xml is saved) and quadCLT_main (applied
to next sequences and saved in the main configuration file).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

0594db87

CLAUDE: per-sensor lwir photometric recalibration + JNA bayer-guard fix · ebef0b23

Andrey Filippov authored Jul 03, 2026

curt_cond_test rework: both PERSENSOR stacks now converted with the test's
own uniform sensor-domain task grid (scale 1.0) instead of leftover GPU
state (MB secondary tasks with negative fractional scales made 'raw'
renders = -1/6 x input; leftover virtual-view grid lost the same border
ROI on every sensor). perSensorFromRawJp4 no longer overwrites the scene's
conditioned image_data.

GpuQuadJna.setBayerImages(force,center) restored the base-class skip-guard
via a native-side jna_bayer_set flag (gpuTileProcessor is null in JNA shell
instances): every execConvertDirect unconditionally re-pulled
quadCLT.getResetImageData(), silently clobbering explicit uploads - made
the raw baseline bit-identical to the conditioned render.

CuasMotion.perSensorLinearFit(): per-sensor a+b*x photometric fit over
safe tiles (weak strength<0.5 or far disparity<1 from -INTER-INTRA-LMA,
inner rect, 8x8 tile->pixel map) against the cross-sensor mean, gauge
keep_averages (mean offset 0, mean scale 1), 3-sigma outlier rejection.
Validated on 1773135476_186641: sensor-mean spread 1353->5 counts,
cross-sensor RMS 358->17 (inliers), b in 0.83..1.11.

CuasMotion.applyLwirLinearCalibration(): folds the fit into the 16+16
lwir offsets/scales (scale'=b*scale, offset'=offset-a/scale'), updates the
center instance + photometric_scene provenance, saves -INTERFRAME.corr-xml.
Applied the standard way at load they compensate the remaining per-sensor
mismatch of the raw /jp4/ tiffs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ebef0b23

CLAUDE: renderSceneSequence() hyperstack mode (make_hyper) + per-sensor averages · 96b1c3d5

Andrey Filippov authored Jul 03, 2026

Add make_hyper parameter (code-selected, not in settings): 0 - flat stack
(old behavior, all pre-existing call sites); >0 - transpose to hyperstack
[sensors][avgs+timestamps][pixel], z (top slider) - timestamps, t (bottom
slider) - sensor channels; 2 - insert per-timestamp average of all used
sensors as first channel (17th), computed from final conditioned slices.
Per-sensor full + center-fraction averages now work in individual mode
(pre-calculated merged-only average falls back to slice computation with
a warning instead of AIOOBE). Number of average frames stays variable
(0/1/2); fopen paths bit-identical by design.
Verified on 495-scene CUAS sequence: INDIVIDUAL debug hyperstack matches
the MERGED convention - to be used as oracle for RT conditioning.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

96b1c3d5

02 Jul, 2026 2 commits

Improving renderSceneSequence() · eae75ef1
Andrey Filippov authored Jul 02, 2026

eae75ef1

CLAUDE: CUAS RT per-sensor conditioning test + NaN-tolerant subtract-avg/LoG · d3c015c3

Andrey Filippov authored Jul 02, 2026

Add a curt_cond_test path (boolean at the top of the CUAS-RT dialog) that,
inside the curt_en branch, renders the 16 per-sensor images and saves them to
the -CENTER instance for calibration inspection:
- CuasMotion.perSensorAveragesFromTD: imclt the per-sensor TD, print the
  16-sensor average spread, save -CUAS-PERSENSOR (16-slice stack, per-slice
  avg labels). Saves via the -CENTER instance, not gpuQuad.getQuadCLT().
- CuasMotion.perSensorFromRawJp4: read RAW /jp4/ per-sensor (oracle getJp4Tiff,
  one thread/sensor), force-H2D (bypass the "GPU mem already correct" verify),
  execConvertDirect from raw, save -CUAS-PERSENSOR-RAW (uncorrected baseline;
  calibration stays a separate "cheat"). RT-seed for the future SATA raw
  stream; GPU port later with Java as oracle.

Fix the NaN border on the RT SUBAVG-CONV2D product:
- CuasDetectRT subtract-average -> NaN-tolerant union (average only non-NaN
  scenes per pixel), matching the oracle -CUAS-MERGED-CUAS; the plain sum
  NaN-propagated (one missing scene poisoned the pixel in every frame ->
  thick border after LoG).
- CuasRTUtils.convolve2DLReLU -> NaN-aware (NaN out only if the center is NaN;
  substitute the center value for NaN taps), so the LoG can't bloom a thin
  border into a thick NaN frame.
- Add -SUBAVG-PRELOG save (post-subtract-avg, pre-LoG) for bisecting.

Compiles (mvn -DskipTests clean package). WIP: the raw-path values/edge and the
in-memory-vs-file (MERGED-CUAS) divergence are still under review; the ~28px
edge residual is traced to the temporal subtract-average at the rotation-swept
composite edge. See ANDREY_CONTINUE.md open items.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

d3c015c3

01 Jul, 2026 1 commit

CLAUDE: lean RT conditioning (CuasConditioning) + per-sensor-average render tool · a62a6b06

Andrey Filippov authored Jul 01, 2026

Piece 1 of the RT conditioning migration (design: internal handoff
2026-06-30_rt_conditioning_design.md):
- cuas/rt/CuasConditioning: lean, self-contained per-sensor conditioning for the
  TP/RT path - Row/Col denoise (on/off, optional HPF of the 1-D avg profile) then
  photometric scales2*(raw-C0)^2 + scale*(raw-C0) - FPN (bit-matches the current
  additive path when scales2=0). Bypasses the heavy QuadCLT conditioning path.
- CuasMotion.perSensorAveragesFromTD(GpuQuad, use_reference): memory-lean render of
  all 16 per-sensor from TD; per-sensor average + spread = calibration-quality gauge.

Building blocks only; full test wiring (raw jp4 -> condition -> convert_direct ->
renderSceneSequence per-sensor averages) + Eyesis invocation entry still pending.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

a62a6b06

27 Jun, 2026 3 commits

CLAUDE: opt-in -Plibtorch Maven profile to unpack libtorch from the mirror (piece 4B) · 5221ec2c

Andrey Filippov authored Jun 27, 2026

Adds an OFF-by-default profile that dependency:unpacks the libtorch native runtime
(org.pytorch:libtorch-cxx11-cu128:2.7.1:zip, cu128) from mirror.elphel.com/maven-dependencies
into target/libtorch-dist for the native DNN backend (libtpdnn.so / CuasDnnLocal) on a
deployment box. The default build never downloads the 3.8GB zip.

Artifact published to the mirror in maven layout (server-side copy of the existing zip)
via tile_processor_gpu/jna/publish_libtorch_mirror.sh. Verified: zip + .pom reachable at
the computed maven URL (HTTP 200, 3.78GB), profile parses (mvn -Plibtorch validate OK).
Full unpack deferred (redundant on this box - libtorch already extracted); exercises on
first deployment machine via `mvn -Plibtorch generate-resources`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

5221ec2c

CLAUDE: bundle L1+L2 TorchScript models as resources (piece 4A) · 7e296021

Andrey Filippov authored Jun 27, 2026

Bundles the exported TorchScript models + their .meta.json sidecars under
src/main/resources/cuas_dnn/<name>/ so CuasDnnLocal runs with no local model
dir (deployment needs no PyTorch/dev tooling - just the .so + libtorch runtime):
weighted9_pm_s/model.ts.pt (+.meta.json) L1 (N=9,P=24,vr=5,out_ch=124)
mexhat_gaps_boost40/model.l2.ts.pt (+.meta.json) L2 (ch_hidden=24,vmax=1.4)

Validated: CuasDnnLocal bundled-resource path (curt_dnn_local_dir empty) extracts
from the jar and matches the server oracle EXACTLY (offset5=0.0, roi=0.0).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

7e296021

CLAUDE: in-process native DNN backend (CuasDnnLocal) via JNA - no server · 54d842f0

Andrey Filippov authored Jun 27, 2026

Piece 3 of the native-JNA DNN path. Adds a local backend that runs the SAME
L1+L2 inference as CuasDnnRemote but in-process via LibTorch (libtpdnn.so / JNA),
so the CUAS pipeline runs without the DGX or any Python server:
  - CuasDnnBackend  : shared interface (upload/getNFrames/inferBatch->BatchResult/close)
  - TpDnnJna        : JNA Library binding libtpdnn.so's C-ABI
  - CuasDnnLocal    : wraps it; reads N/P/vr/l2_ch from each model's bundled .meta.json
                      (single source of truth), float[][]<->float[], builds BatchResult
  - CuasDnnRemote   : now implements CuasDnnBackend (signatures unchanged)
  - CuasDetectRT    : DNN path gate now fires on (curt_dnn_remote || curt_dnn_local);
                      backend = local? CuasDnnLocal : CuasDnnRemote; ensureServer skipped
                      when local; local-CPU-ORT gate also excludes curt_dnn_local (no
                      double-run). runDnnRemote loop unchanged.
  - IntersceneMatchParameters: curt_dnn_local (flag) + curt_dnn_local_dir (model dir
                      override; empty = bundled /cuas_dnn resource) + GUI labels/persist.

Validated: full Java->JNA->libtpdnn vs the Python-server oracle = EXACT
(offset5=0.0, roi=0.0, nch=6). mvn -DskipTests package OK.

Runtime: -Djna.library.path=<dir with libtpdnn.so>; libtpdnn.so finds libtorch via
its rpath. Model resolution mirrors CuasDnnRemote's bundled-vs-override scheme.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

54d842f0

26 Jun, 2026 14 commits

CLAUDE: strip TEMP migration probes + proactive side-effect-parity sweep · 9258b5d9

Andrey Filippov authored Jun 26, 2026

Migration validated (JNA CUAS targets match JCuda). Cleanup:
- Removed all TEMP debug probes (-Dtp.dbg.corrpair, probeClt, saveTDRender,
  the one-shot DBG/PROBE blocks in GpuQuadJna + CuasMotion). Real fixes kept
  (rectilinear port, num_pairs=3, setCorrIndicesTdData, imclt ref_scene,
  num_corr_tiles propagation f6dcc90f).
- Proactive sweep for the f6dcc90f bug-class (JNA override drops a base
  side-effect field write): getCorrComboIndices/getCorr2DCombo propagate
  num_corr_combo_tiles, setCorrIndicesTdData propagates num_corr_tiles,
  getTextureIndices propagates num_texture_tiles; those fields made protected.
  These four are LATENT (no live consumer on the validated CUAS path) and are
  marked NOT-YET-TESTED inline.

Java-only. mvn compile clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

9258b5d9

CLAUDE: JNA getCorrIndices propagates num_corr_tiles to base field (real fix) · f6dcc90f

Andrey Filippov authored Jun 26, 2026

Root cause of the CORR2D-all-NaN / 0-targets: the inter-correlation actually
works (probe showed num_corr_tiles=8850 = 4425 tiles x (1 sensor + 1 sum)), but
the TD readback dropped it. Base GpuQuad.getCorrIndices() sets the num_corr_tiles
field ("also sets num_corr_tiles"); GpuQuadJna.getCorrIndices() read the native
count locally and returned the array WITHOUT setting the field. So
TDCorrTile.getFromGpu (num_tiles = getNumCorrTiles()/num_pairs) and base
getCorrTilesTd (uses the field directly) saw a stale 0 -> built 0 tiles ->
empty target sequence -> null ROUND_ONE image -> saveImagePlusInModelDirectory
NPE (the misplaced-null-guard latent bug is just the messenger).

Fix: GpuQuadJna.getCorrIndices() sets num_corr_tiles = n (native count); field
made protected so the subclass can. Java-only.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

f6dcc90f

CLAUDE: probe inter-corr sel_sensors/num_cams/count + fix saveTDRender NPE (TEMP) · c592e5ff

Andrey Filippov authored Jun 26, 2026

Post-mortem showed both CLT buffers loaded but inter-correlation -> 0 tiles.
index_inter_correlate selects by __popc(sel_sensors); static reading says
sel_sensors should be 1 (single-cam rectilinear), so a runtime value differs.
- GpuQuadJna.execCorr2D_inter_TD: one-shot print sel_sensors/popc/num_cams/
  num_colors/scales + the returned num_corr_tiles.
- saveTDRender: makeArrays NPE'd on null titles (derefs titles[i]); pass a
  non-null titles[] so the render saves instead of crashing the run.
TEMP — remove with the rest of the -Dtp.dbg.corrpair probe.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

c592e5ff

CLAUDE: more one-shot post-mortem diagnostics (TEMP, Java-only) · f163b14a

Andrey Filippov authored Jun 26, 2026

Single sacrificial run -> generous logging without spam:
- GpuQuadJna: probe BOTH first ref convert (gpu_clt_ref) AND first scene convert
  (gpu_clt) — NaN%/nonzero/range each (probeClt helper).
- CuasMotion.correlatePair one-shot: log targets_mv / tp_ref,tp_img counts /
  erase_cltr,erase_clt / fpixels null-ness, plus TD-correlation read-back stats
  (tile count + NaN% of TD values) alongside the DBG-REF/DBG-IMG renders.

All gated/one-shot; no native change (reads via existing tp_proc_get_clt).
TEMP — remove with the rest of the -Dtp.dbg.corrpair probe.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

f163b14a

CLAUDE: JNA imclt ref_scene fix + one-shot ref/scene post-mortem renders (TEMP) · 6da17148

Andrey Filippov authored Jun 26, 2026

(1) GpuQuadJna.execImcltRbgAll now passes ref_scene -> tp_proc_exec_imclt(use_ref)
so renderFromTD(true) renders gpu_clt_ref, not gpu_clt (TpJna binding updated).

(2) TEMP post-mortem in CuasMotion.correlatePair (gate -Dtp.dbg.corrpair=1,
one-shot): after the inter-correlation, SAVE ref (gpu_clt_ref) + scene (gpu_clt)
CLT renders to the model dir via saveImagePlusInModelDirectory (persist past the
later crash; no window flood). DBG-REF blank/NaN => reference not loaded =>
explains the all-NaN CORR2D (inter corr needs both images). REMOVE after fix.

Needs the tile_processor_gpu imclt use_ref commit + libtileproc.so rebuild.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

6da17148

CLAUDE: TEMP probe — print gpu_clt_ref stats after first ref convert (JNA) · f8a7a972

Andrey Filippov authored Jun 26, 2026

Diagnostic for the CORR2D-all-NaN numeric divergence: after the first
ref_scene=true convert, read back gpu_clt_ref (tp_proc_get_clt use_ref=1) and
print NaN%/nonzero/min/max once. Confirms whether the reference convert
populates gpu_clt_ref at all (vs scene gpu_clt which is correct -> SOURCE).

Prints one "PROBE gpu_clt_ref[cam0]: ..." line to System.out (captured in the
per-scene -SYSTEM_OUT.log). TEMP — remove after the divergence is fixed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

f8a7a972

CLAUDE: override setCorrIndicesTdData (JNA) -> tp_proc_set_corr_indices_td · ae4bf02f

Andrey Filippov authored Jun 26, 2026

Third JNA-mode gap on the CUAS oracle path: TDCorrTile.convertTDtoPD re-uploads
host-selected TD correlation tiles via GpuQuad.setCorrIndicesTdData, which was
not overridden -> base JCuda cuMemcpyHtoD on a null gpu_corr_indices -> NPE.

Adds the GpuQuadJna override (ensureRbgCorr() then delegate to the new native
tp_proc_set_corr_indices_td) + the TpJna binding. Gap-finder over the full CUAS
TD path (CuasMotion + TDCorrTile) confirms this was the LAST GPU-touching
un-overridden method; the rest are pure config getters.

Needs the matching tile_processor_gpu commit (native fn) + libtileproc.so
rebuild. mvn compile clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ae4bf02f

CLAUDE: fix rectilinear inter-scene corr buffer sizing (JNA) — num_pairs=3 · b6a986ac

Andrey Filippov authored Jun 26, 2026

Second JNA-mode failure after the Phase-1 NPE fix: cudaErrorIllegalAddress at
tp_utils.cu:142 (image upload), actually a DEFERRED fault from the inter-scene
correlate2D_inter kernel writing out of bounds.

Root cause: GpuQuadJna.ensureRbgCorr sized the native correlation buffers via
Correlation2d.getNumPairs(num_cams). For the rectilinear single-camera config
num_cams=1 -> getNumPairs(1)=0 -> tp_proc_setup_rbg_corr allocates zero-size
gpu_corrs_td / gpu_corrs / gpu_corr_indices, so the inter-scene correlation
wrote past them -> illegal address, surfacing (sticky) at the next CUDA call.

Fix: mirror the JCuda oracle, whose rectilinear ctor hardcodes num_pairs=3
(GpuQuad.java:732) for exactly the inter-scene case ->
  int num_pairs = rectilinear ? 3 : Correlation2d.getNumPairs(num_cams);

Java-only; libtileproc.so untouched. mvn compile clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

b6a986ac

CLAUDE: JNA rectilinear single-array config port (Phase 1) · b65ee10d

Andrey Filippov authored Jun 26, 2026

Fixes the whole bug-class behind the -Dtp.backend=jna NPE in GpuQuad.setLpfRbg
(CuasRanging.detectTargets -> CuasMotion -> setRectilinearReferenceTD): the
rectilinear single-camera GpuQuad was built via the raw JCuda ctor, bypassing
the backend factory, so in JNA mode it got a null gpuTileProcessor.

- GpuQuad.createRectilinear(): backend-aware factory parallel to create().
  JCUDA branch is byte-for-byte the legacy ctor (oracle path untouched); JNA
  branch builds a clean rectilinear GpuQuadJna. New no-alloc rectilinear ctor
  (num_cams=1, no kernels/geometry).
- GpuQuadJna: rectilinear ctor + shared initNative(); the two overrides the
  gap-finder predicted -- reAllocateClt (no-op; native CLT pre-sized in setup)
  and singular setBayerImage (-> tp_proc_set_image). execConvertDirect already
  guarded on the rectilinear flag.
- CuasMotion:452 routed through createRectilinear (CUAS rectilinear now
  JNA-capable).
- ComboMatch:899 fail-loud UnsupportedOperationException in JNA mode
  (orthomosaic, wider unported surface, off the current path -- stays JCuda).

Java-only; libtileproc.so untouched. mvn compile clean. JCuda legacy frozen as
oracle; core convert_direct flag-soup cleanup deferred to Phase 2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

b65ee10d

CLAUDE: gitignore ANDREY_CONTINUE.md (resume-now file) · 753d6900

Andrey Filippov authored Jun 26, 2026

Root-level, Andrey-facing, single predictable "what to do right now" file the
agent overwrites before session exit, so lost tmux scrollback never costs the
restart plan. Local-only; gitignored alongside CLAUDE.md/AGENTS.md/MEMORY.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

753d6900

Made the program always save PHASE_TWO_GOOD as it is used for input, not · 848d497a
Andrey Filippov authored Jun 26, 2026
```
just debug
```
848d497a

CLAUDE: GpuQuadJna texture overrides (oracle): execTextures + readback · 747ff9d1

Andrey Filippov authored Jun 26, 2026

Completes the oracle GPU surface. The reliable gap finder is
  comm -23 <(ImageDtt gpuQuad.* calls) <(GpuQuadJna overrides)
not the gpuTrace dump (only ~14 methods are instrumented, so e.g.
getFlatTextures was invisible in the trace though it is on the path).

Overrides (delegating to the new tp_proc_* texture API):
- execTextures: builds weights[3]/params[5], forwards calc_textures/calc_extra/
  linescan/dust/keep flags. Implements the production (USE_DS_DP) behavior.
- getTextureIndices: reads kernel-built count + packed indices.
- getExtra: reshapes diff_rgb_combo (texture_indices order) into
  [num_cams*(num_colors+1)][tilesX*tilesY] keyed by ntile -- identical to base.
- getFlatTextures: de-pitches gpu_textures -- identical to base.

TpJna.java: bindings for tp_proc_exec_textures/get_texture_indices/
get_diff_rgb_combo/get_textures.

Edits only -- not mvn-compiled (Eyesis run was live). Signatures match base
@Override; referenced fields are public final / public static.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

747ff9d1

CLAUDE: gpuTrace dedup + GpuQuadJna oracle TD-corr readback overrides · 39b0fb90

Andrey Filippov authored Jun 26, 2026

gpuTrace now prints each Class.method ONCE (was per-call -> spammy). Oracle JCuda trace showed it uses
the TD-correlation readback path: getCorrIndices / getCorrTdData / getCorrComboIndices / eraseGpuCorrs
(un-overridden -> would NPE in JNA). Override them via the new native tp_proc_get_corr_indices /
get_corr_combo_indices / get_corr_td (DtoH) + tp_proc_erase_corrs. mvn -DskipTests compile clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

39b0fb90

CLAUDE: gpuTrace(-Dtp.trace=1) on uncertain GpuQuad methods — JCuda/JNA oracle comparison · 8b3dd30b

Andrey Filippov authored Jun 26, 2026

Add GpuQuad.gpuTrace(m) printing "[GPUTRACE] "+getClass().getSimpleName()+"."+m (off unless
-Dtp.trace=1). Instrument the un-overridden GPU methods (potential oracle gaps): getCltData,
presentCltData, eraseGpuCorrs, execCorr2D (bundled), readbackTasks, setFullFrameImages, getCorrTdData,
getCorrIndices, getCorrComboIndices, getExtra, getTextureIndices, getRBGA, execRBGA, execTextures.
Since GpuQuadJna extends GpuQuad, the trace prints "GpuQuad.X" under JCuda and "GpuQuadJna.X" if a JNA
run falls through to one (= coverage gap) -> reveals oracle's real GPU usage before any NPE.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

8b3dd30b

25 Jun, 2026 5 commits

CLAUDE: GpuQuadJna updateTasks + getWH + native tp_proc_get_tasks (pre-empt task-readback NPE) · 31772785

Andrey Filippov authored Jun 25, 2026

updateTasks is called all over ImageDtt right after execSetTilesOffsets (reads gpu_ftasks back to
rebuild TpTask[] with computed centerXY/disp_dist) -> tp_proc_get_tasks (DtoH). getWH returns full
frame (base returns null gpu_clt_wh). Proactive (locating same-cause base-method derefs of null
JCuda fields).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

31772785

CLAUDE: GpuQuadJna getCltSize/getNumTiles/setCltData + native tp_proc_set_clt (CLT restore path) · df0ac36a

Andrey Filippov authored Jun 25, 2026

convertCenterClt -> setComboToTD -> setCltData pushes the restored center CLT to GPU; base getCltSize/
getNumTiles deref null gpu_clt_wh. Override to full-frame dims; setCltData -> new tp_proc_set_clt
(HtoD per-cam slice, inverse of tp_proc_get_clt). Fixes the third JNA NPE (getCltSize:1211).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

df0ac36a

CLAUDE: GpuQuadJna override updateQuadCLT + resetBayer (skip null gpuTileProcessor.bayer_set) · e0ee9e2a

Andrey Filippov authored Jun 25, 2026

updateQuadCLT sets quadCLT + flag-only resetGeometryCorrection*(); skips the base's
gpuTileProcessor.bayer_set clear (N/A natively - bayer re-uploaded each convert). resetBayer no-op.
Fixes the second JNA NPE (updateQuadCLT:263).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

e0ee9e2a

CLAUDE: GpuQuadJna.printConstMem no-op override (avoid debug-path NPE on native module) · e640aa3a
Andrey Filippov authored Jun 25, 2026
```
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
```
e640aa3a

CLAUDE: GpuQuadJna setLpfRbg/setLpfCorr overrides -> tp_proc_set_const (fix first JNA NPE) · 04a10256

Andrey Filippov authored Jun 25, 2026

setLpfRbg flattens the 4x64 r/b/g/m arrays -> "lpf_data"; setLpfCorr -> const_name
(lpf_corr / lpf_rb_corr). Uploads to the native module's constant memory, matching JCUDA. mvn clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

04a10256