CLAUDE: CuasTD - NaN-aware cross-sensor TD consolidation (A2 step 2) + JNA getCltData override

Phase A2/B building block: consolidate the 16 per-sensor CLT channels into ONE averaged TD channel (average images BEFORE correlation - multiply averages, not average products). Per-tile granularity: sum sensors that have the tile (first element NaN = absent), count, divide; count plane returned as the weight; a stray in-tile NaN poisons the whole result tile (fail-visible). Not available on GPU (combine_inter only sums correlation PRODUCTS) - this CPU implementation + get/setCltData D2H/H2D is the A2 bridge and the bit oracle for the future clt_average_sensors kernel. - CuasTD.validateConsolidation(): linearity oracle - imclt(TD-avg) must equal pixel-average of per-sensor imclt renders (same GPU imclt both sides); prints count-plane stats + max|diff|/RMS, saves -CUAS-TDAVG-CHECK 3-slice stack, restores original TD. Wired into the curt_cond_test branch after perSensorFromRawJp4 (uses its raw-jp4 16-sensor TD). - GpuQuadJna.getCltData() override added (base derefs null gpu_clt_h on JNA shells - the known un-overridden-accessor class); uses tp_proc_get_clt. mvn compile clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

CLAUDE: CuasTD - NaN-aware cross-sensor TD consolidation (A2 step 2) + JNA getCltData override
Phase A2/B building block: consolidate the 16 per-sensor CLT channels into ONE averaged TD channel (average images BEFORE correlation - multiply averages, not average products). Per-tile granularity: sum sensors that have the tile (first element NaN = absent), count, divide; count plane returned as the weight; a stray in-tile NaN poisons the whole result tile (fail-visible). Not available on GPU (combine_inter only sums correlation PRODUCTS) - this CPU implementation + get/setCltData D2H/H2D is the A2 bridge and the bit oracle for the future clt_average_sensors kernel. - CuasTD.validateConsolidation(): linearity oracle - imclt(TD-avg) must equal pixel-average of per-sensor imclt renders (same GPU imclt both sides); prints count-plane stats + max|diff|/RMS, saves -CUAS-TDAVG-CHECK 3-slice stack, restores original TD. Wired into the curt_cond_test branch after perSensorFromRawJp4 (uses its raw-jp4 16-sensor TD). - GpuQuadJna.getCltData() override added (base derefs null gpu_clt_h on JNA shells - the known un-overridden-accessor class); uses tp_proc_get_clt. mvn compile clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
6ecc13ad · Andrey Filippov · e6d2be34 · 6ecc13ad · 6ecc13ad · 6ecc13ad
Commit 6ecc13ad authored Jul 04, 2026 by Andrey Filippov
3 changed files
--- a/src/main/java/com/elphel/imagej/cuas/rt/CuasTD.java
+++ b/src/main/java/com/elphel/imagej/cuas/rt/CuasTD.java
+/**
+ **
+ ** CuasTD.java - transform-domain utilities for the CUAS RT path:
+ ** cross-sensor consolidation of per-sensor CLT data (CPU oracle for the
+ ** future GPU kernel).
+ **
+ ** Copyright (C) 2026 Elphel, Inc.
+ **
+ ** -----------------------------------------------------------------------------**
+ **
+ **  CuasTD.java is free software: you can redistribute it and/or modify
+ **  it under the terms of the GNU General Public License as published by
+ **  the Free Software Foundation, either version 3 of the License, or
+ **  (at your option) any later version.
+ **
+ **  This program is distributed in the hope that it will be useful,
+ **  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ **  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ **  GNU General Public License for more details.
+ **
+ **  You should have received a copy of the GNU General Public License
+ **  along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ ** -----------------------------------------------------------------------------**
+ **
+ */
+package com.elphel.imagej.cuas.rt;
+
+import com.elphel.imagej.common.ShowDoubleFloatArrays;
+import com.elphel.imagej.cuas.CuasMotion;
+import com.elphel.imagej.gpu.GPUTileProcessor;
+import com.elphel.imagej.gpu.GpuQuad;
+import com.elphel.imagej.tileprocessor.QuadCLT;
+
+import ij.ImagePlus;
+
+/**
+ * Transform-domain utilities for the CUAS RT pose/detection path.
+ *
+ * The central operation is the NaN-aware cross-sensor consolidation of the 16
+ * per-sensor CLT channels into ONE averaged TD channel - the "average images
+ * BEFORE correlation, multiply averages instead of averaging products" scheme
+ * (better SNR: sensor-noise cross-terms drop before the nonlinear FZ-normalize,
+ * and 1 correlation replaces 16). Not available on the GPU (combine_inter only
+ * sums per-sensor correlation PRODUCTS, unweighted); this CPU implementation +
+ * the GpuQuad get/setCltData D2H/H2D bridge is the phase-A2 path and the bit
+ * oracle for the future clt_average_sensors CUDA kernel.
+ *
+ * TD granularity note: validity is per TILE per sensor (a sensor either has the
+ * converted tile or it was erased to NaN / never written) - so the consolidation
+ * unit is one [4][8][8] CLT tile (per color), and the weight is a per-tile
+ * contributing-sensor COUNT, not per element.
+ *
+ * By Claude on 07/05/2026, from Andrey's design.
+ */
+public class CuasTD {
+	public static final int TD_CHUNK = 4 * GPUTileProcessor.DTT_SIZE * GPUTileProcessor.DTT_SIZE; // one CLT tile (per color)
+
+	/**
+	 * Consolidate per-sensor TD (CLT) data into a single averaged channel:
+	 * per tile - add the tiles of the sensors that HAVE it (skip NaN tiles),
+	 * count them, divide by the count. A tile is considered absent for a sensor
+	 * if its first element is NaN (erase_clt_tiles fills whole tiles); any stray
+	 * NaN inside an otherwise-valid tile poisons the sum and the result tile is
+	 * set to all-NaN (with the count kept) - fail-visible, never fail-silent.
+	 *
+	 * @param fclt   per-sensor CLT as returned by GpuQuad.getCltData():
+	 *               [num_sensors][num_tiles*num_colors*TD_CHUNK]
+	 * @param counts null or int[num_tiles*num_colors] - filled with the number of
+	 *               contributing sensors per tile (the weight plane)
+	 * @return averaged CLT channel, same length as fclt[0]; all-NaN tiles where
+	 *               no sensor contributed
+	 */
+	public static float [] consolidateSensorsTD(
+			final float [][] fclt,
+			final int []     counts) {
+		final int len = fclt[0].length;
+		final int num_chunks = len / TD_CHUNK;
+		final float [] avg = new float [len];
+		for (int chunk = 0; chunk < num_chunks; chunk++) {
+			final int offs = chunk * TD_CHUNK;
+			int n = 0;
+			boolean first = true;
+			for (int nsens = 0; nsens < fclt.length; nsens++) {
+				final float [] f = fclt[nsens];
+				if (Float.isNaN(f[offs])) continue; // sensor does not have this tile
+				n++;
+				if (first) {
+					System.arraycopy(f, offs, avg, offs, TD_CHUNK);
+					first = false;
+				} else {
+					for (int i = 0; i < TD_CHUNK; i++) avg[offs + i] += f[offs + i];
+				}
+			}
+			if (n == 0) {
+				java.util.Arrays.fill(avg, offs, offs + TD_CHUNK, Float.NaN);
+			} else if (n > 1) {
+				final float scale = 1.0f / n;
+				boolean poisoned = false;
+				for (int i = 0; i < TD_CHUNK; i++) {
+					avg[offs + i] *= scale;
+					poisoned |= Float.isNaN(avg[offs + i]);
+				}
+				if (poisoned) { // stray NaN inside a "valid" tile - make the whole tile NaN
+					java.util.Arrays.fill(avg, offs, offs + TD_CHUNK, Float.NaN);
+				}
+			}
+			if (counts != null) counts[chunk] = n;
+		}
+		return avg;
+	}
+
+	/**
+	 * Validate the consolidation against the linearity oracle: imclt is linear, so
+	 * imclt(TD-average of sensors) must equal the pixel-domain average of the
+	 * per-sensor imclt renders (up to float summation order). Renders both through
+	 * the SAME GPU imclt, compares, prints max|diff| / RMS, optionally saves a
+	 * 3-slice debug stack, and restores the original per-sensor TD.
+	 *
+	 * @param gpuQuad   GPU with a current 16-sensor TD (e.g. after convert_direct)
+	 * @param save_qclt null or QuadCLT whose model directory receives the debug
+	 *                  stack (-CUAS-TDAVG-CHECK: [pixel-avg, imclt(TD-avg), diff])
+	 * @param debugLevel debug level
+	 * @return max abs difference (finite pixels); expect float-epsilon scale
+	 */
+	public static double validateConsolidation(
+			final GpuQuad gpuQuad,
+			final QuadCLT save_qclt,
+			final int     debugLevel) {
+		final int num_sens = gpuQuad.getNumSensors();
+		final float [][] orig = gpuQuad.getCltData(false);
+		final float [][] imgs = CuasMotion.perSensorImagesFromTD(gpuQuad, false); // renders BEFORE overwrite
+		final int width  = gpuQuad.getImageWidth();
+		final int height = gpuQuad.getImageHeight();
+		final int [] counts = new int [orig[0].length / TD_CHUNK];
+		final float [] avg_td = consolidateSensorsTD(orig, counts);
+		int cmin = Integer.MAX_VALUE, cmax = 0; long csum = 0; int cnz = 0;
+		for (int c : counts) if (c > 0) { cmin = Math.min(cmin, c); cmax = Math.max(cmax, c); csum += c; cnz++; }
+		System.out.println("CuasTD.validateConsolidation(): contributing sensors per tile: min="+
+				(cnz > 0 ? cmin : 0)+", max="+cmax+", mean="+String.format("%.2f", (cnz > 0)? ((double) csum)/cnz : 0.0)+
+				", tiles with data="+cnz+" of "+counts.length);
+		// upload the averaged channel to ALL sensor slots, render, take sensor 0
+		for (int ncam = 0; ncam < num_sens; ncam++) gpuQuad.setCltData(ncam, avg_td, false);
+		final float [][] imgs_avg = CuasMotion.perSensorImagesFromTD(gpuQuad, false);
+		final float [] render_of_avg = imgs_avg[0];
+		// pixel-domain NaN-aware average of the per-sensor renders
+		final float [] pix_avg = new float [width * height];
+		final float [] diff =    new float [width * height];
+		double max_diff = 0, sum2 = 0; int n_cmp = 0;
+		for (int i = 0; i < pix_avg.length; i++) {
+			double s = 0; int n = 0;
+			for (int ncam = 0; ncam < num_sens; ncam++) {
+				final float v = imgs[ncam][i];
+				if (!Float.isNaN(v)) { s += v; n++; }
+			}
+			pix_avg[i] = (n > 0) ? (float) (s / n) : Float.NaN;
+			diff[i] = pix_avg[i] - render_of_avg[i];
+			if (!Float.isNaN(diff[i])) {
+				final double ad = Math.abs(diff[i]);
+				if (ad > max_diff) max_diff = ad;
+				sum2 += diff[i] * diff[i];
+				n_cmp++;
+			}
+		}
+		System.out.println(String.format(
+				"CuasTD.validateConsolidation(): imclt(TD-avg) vs pixel-avg(per-sensor imclt): max|diff|=%.6g, RMS=%.6g over %d pixels",
+				max_diff, (n_cmp > 0) ? Math.sqrt(sum2 / n_cmp) : Double.NaN, n_cmp));
+		// restore the original per-sensor TD
+		for (int ncam = 0; ncam < num_sens; ncam++) gpuQuad.setCltData(ncam, orig[ncam], false);
+		if (save_qclt != null) {
+			final ImagePlus imp = ShowDoubleFloatArrays.makeArrays(
+					new float [][] {pix_avg, render_of_avg, diff}, width, height,
+					save_qclt.getImageName()+"-CUAS-TDAVG-CHECK",
+					new String [] {"pixel_avg", "imclt_td_avg", "diff"});
+			save_qclt.saveImagePlusInModelDirectory(null, imp);
+		}
+		return max_diff;
+	}
+}
--- a/src/main/java/com/elphel/imagej/gpu/jna/GpuQuadJna.java
+++ b/src/main/java/com/elphel/imagej/gpu/jna/GpuQuadJna.java
@@ -109,6 +109,16 @@ public class GpuQuadJna extends GpuQuad {
        }
        lib.tp_proc_set_clt(proc, ncam, fclt, use_ref ? 1 : 0);
    }
+    // Read all per-cam CLT slices from the device (base derefs null gpu_clt_h on JNA shells).
+    // D2H half of the CPU sensor-TD consolidation bridge (CuasTD). By Claude on 07/05/2026.
+    @Override public float [][] getCltData(boolean use_ref) {
+        int clt_size = getCltSize(use_ref);
+        float [][] fclt = new float [num_cams][clt_size];
+        for (int ncam = 0; ncam < num_cams; ncam++) {
+            lib.tp_proc_get_clt(proc, ncam, use_ref ? 1 : 0, fclt[ncam]);
+        }
+        return fclt;
+    }

    /** Native handles for the override implementations (added incrementally). */
    protected TpJna   lib()    { return lib; }

--- a/src/main/java/com/elphel/imagej/tileprocessor/OpticalFlow.java
+++ b/src/main/java/com/elphel/imagej/tileprocessor/OpticalFlow.java
@@ -7316,6 +7316,14 @@ java.lang.NullPointerException
 					}
 				}
 				CuasMotion.perSensorFromRawJp4(clt_parameters, master_CLT, ImageDtt.THREADS_MAX, debugLevel);
+				// A2 step 2 validation: NaN-aware cross-sensor TD consolidation (CPU oracle of the
+				// future clt_average_sensors kernel) checked against the imclt linearity oracle:
+				// imclt(TD-average) must equal pixel-average of per-sensor imclt renders.
+				// Uses the raw-jp4 16-sensor TD just built above. By Claude on 07/05/2026.
+				com.elphel.imagej.cuas.rt.CuasTD.validateConsolidation(
+						master_CLT.getGPUQuad(), // GpuQuad with the current 16-sensor TD
+						master_CLT,              // debug-stack save target (-CUAS-TDAVG-CHECK)
+						debugLevel);
 			} else {
 				cuasRangingRT.saveUasFlightLogCsv(uasLogReader, imp_targets); // UAS flight-log truth -> <name>-UAS_DATA.tsv (mode-0 only; needs QuadCLT pose). By Claude on 06/24/2026
 				new CuasDetectRT(