1. 27 Jun, 2026 1 commit
    • Andrey Filippov's avatar
      CLAUDE: in-process native DNN backend (CuasDnnLocal) via JNA - no server · 54d842f0
      Andrey Filippov authored
      Piece 3 of the native-JNA DNN path. Adds a local backend that runs the SAME
      L1+L2 inference as CuasDnnRemote but in-process via LibTorch (libtpdnn.so / JNA),
      so the CUAS pipeline runs without the DGX or any Python server:
        - CuasDnnBackend  : shared interface (upload/getNFrames/inferBatch->BatchResult/close)
        - TpDnnJna        : JNA Library binding libtpdnn.so's C-ABI
        - CuasDnnLocal    : wraps it; reads N/P/vr/l2_ch from each model's bundled .meta.json
                            (single source of truth), float[][]<->float[], builds BatchResult
        - CuasDnnRemote   : now implements CuasDnnBackend (signatures unchanged)
        - CuasDetectRT    : DNN path gate now fires on (curt_dnn_remote || curt_dnn_local);
                            backend = local? CuasDnnLocal : CuasDnnRemote; ensureServer skipped
                            when local; local-CPU-ORT gate also excludes curt_dnn_local (no
                            double-run). runDnnRemote loop unchanged.
        - IntersceneMatchParameters: curt_dnn_local (flag) + curt_dnn_local_dir (model dir
                            override; empty = bundled /cuas_dnn resource) + GUI labels/persist.
      
      Validated: full Java->JNA->libtpdnn vs the Python-server oracle = EXACT
      (offset5=0.0, roi=0.0, nch=6). mvn -DskipTests package OK.
      
      Runtime: -Djna.library.path=<dir with libtpdnn.so>; libtpdnn.so finds libtorch via
      its rpath. Model resolution mirrors CuasDnnRemote's bundled-vs-override scheme.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      54d842f0
  2. 26 Jun, 2026 14 commits
    • Andrey Filippov's avatar
      CLAUDE: strip TEMP migration probes + proactive side-effect-parity sweep · 9258b5d9
      Andrey Filippov authored
      Migration validated (JNA CUAS targets match JCuda). Cleanup:
      - Removed all TEMP debug probes (-Dtp.dbg.corrpair, probeClt, saveTDRender,
        the one-shot DBG/PROBE blocks in GpuQuadJna + CuasMotion). Real fixes kept
        (rectilinear port, num_pairs=3, setCorrIndicesTdData, imclt ref_scene,
        num_corr_tiles propagation f6dcc90f).
      - Proactive sweep for the f6dcc90f bug-class (JNA override drops a base
        side-effect field write): getCorrComboIndices/getCorr2DCombo propagate
        num_corr_combo_tiles, setCorrIndicesTdData propagates num_corr_tiles,
        getTextureIndices propagates num_texture_tiles; those fields made protected.
        These four are LATENT (no live consumer on the validated CUAS path) and are
        marked NOT-YET-TESTED inline.
      
      Java-only. mvn compile clean.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      9258b5d9
    • Andrey Filippov's avatar
      CLAUDE: JNA getCorrIndices propagates num_corr_tiles to base field (real fix) · f6dcc90f
      Andrey Filippov authored
      Root cause of the CORR2D-all-NaN / 0-targets: the inter-correlation actually
      works (probe showed num_corr_tiles=8850 = 4425 tiles x (1 sensor + 1 sum)), but
      the TD readback dropped it. Base GpuQuad.getCorrIndices() sets the num_corr_tiles
      field ("also sets num_corr_tiles"); GpuQuadJna.getCorrIndices() read the native
      count locally and returned the array WITHOUT setting the field. So
      TDCorrTile.getFromGpu (num_tiles = getNumCorrTiles()/num_pairs) and base
      getCorrTilesTd (uses the field directly) saw a stale 0 -> built 0 tiles ->
      empty target sequence -> null ROUND_ONE image -> saveImagePlusInModelDirectory
      NPE (the misplaced-null-guard latent bug is just the messenger).
      
      Fix: GpuQuadJna.getCorrIndices() sets num_corr_tiles = n (native count); field
      made protected so the subclass can. Java-only.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      f6dcc90f
    • Andrey Filippov's avatar
      CLAUDE: probe inter-corr sel_sensors/num_cams/count + fix saveTDRender NPE (TEMP) · c592e5ff
      Andrey Filippov authored
      Post-mortem showed both CLT buffers loaded but inter-correlation -> 0 tiles.
      index_inter_correlate selects by __popc(sel_sensors); static reading says
      sel_sensors should be 1 (single-cam rectilinear), so a runtime value differs.
      - GpuQuadJna.execCorr2D_inter_TD: one-shot print sel_sensors/popc/num_cams/
        num_colors/scales + the returned num_corr_tiles.
      - saveTDRender: makeArrays NPE'd on null titles (derefs titles[i]); pass a
        non-null titles[] so the render saves instead of crashing the run.
      TEMP — remove with the rest of the -Dtp.dbg.corrpair probe.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      c592e5ff
    • Andrey Filippov's avatar
      CLAUDE: more one-shot post-mortem diagnostics (TEMP, Java-only) · f163b14a
      Andrey Filippov authored
      Single sacrificial run -> generous logging without spam:
      - GpuQuadJna: probe BOTH first ref convert (gpu_clt_ref) AND first scene convert
        (gpu_clt) — NaN%/nonzero/range each (probeClt helper).
      - CuasMotion.correlatePair one-shot: log targets_mv / tp_ref,tp_img counts /
        erase_cltr,erase_clt / fpixels null-ness, plus TD-correlation read-back stats
        (tile count + NaN% of TD values) alongside the DBG-REF/DBG-IMG renders.
      
      All gated/one-shot; no native change (reads via existing tp_proc_get_clt).
      TEMP — remove with the rest of the -Dtp.dbg.corrpair probe.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      f163b14a
    • Andrey Filippov's avatar
      CLAUDE: JNA imclt ref_scene fix + one-shot ref/scene post-mortem renders (TEMP) · 6da17148
      Andrey Filippov authored
      (1) GpuQuadJna.execImcltRbgAll now passes ref_scene -> tp_proc_exec_imclt(use_ref)
      so renderFromTD(true) renders gpu_clt_ref, not gpu_clt (TpJna binding updated).
      
      (2) TEMP post-mortem in CuasMotion.correlatePair (gate -Dtp.dbg.corrpair=1,
      one-shot): after the inter-correlation, SAVE ref (gpu_clt_ref) + scene (gpu_clt)
      CLT renders to the model dir via saveImagePlusInModelDirectory (persist past the
      later crash; no window flood). DBG-REF blank/NaN => reference not loaded =>
      explains the all-NaN CORR2D (inter corr needs both images). REMOVE after fix.
      
      Needs the tile_processor_gpu imclt use_ref commit + libtileproc.so rebuild.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      6da17148
    • Andrey Filippov's avatar
      CLAUDE: TEMP probe — print gpu_clt_ref stats after first ref convert (JNA) · f8a7a972
      Andrey Filippov authored
      Diagnostic for the CORR2D-all-NaN numeric divergence: after the first
      ref_scene=true convert, read back gpu_clt_ref (tp_proc_get_clt use_ref=1) and
      print NaN%/nonzero/min/max once. Confirms whether the reference convert
      populates gpu_clt_ref at all (vs scene gpu_clt which is correct -> SOURCE).
      
      Prints one "PROBE gpu_clt_ref[cam0]: ..." line to System.out (captured in the
      per-scene -SYSTEM_OUT.log). TEMP — remove after the divergence is fixed.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      f8a7a972
    • Andrey Filippov's avatar
      CLAUDE: override setCorrIndicesTdData (JNA) -> tp_proc_set_corr_indices_td · ae4bf02f
      Andrey Filippov authored
      Third JNA-mode gap on the CUAS oracle path: TDCorrTile.convertTDtoPD re-uploads
      host-selected TD correlation tiles via GpuQuad.setCorrIndicesTdData, which was
      not overridden -> base JCuda cuMemcpyHtoD on a null gpu_corr_indices -> NPE.
      
      Adds the GpuQuadJna override (ensureRbgCorr() then delegate to the new native
      tp_proc_set_corr_indices_td) + the TpJna binding. Gap-finder over the full CUAS
      TD path (CuasMotion + TDCorrTile) confirms this was the LAST GPU-touching
      un-overridden method; the rest are pure config getters.
      
      Needs the matching tile_processor_gpu commit (native fn) + libtileproc.so
      rebuild. mvn compile clean.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      ae4bf02f
    • Andrey Filippov's avatar
      CLAUDE: fix rectilinear inter-scene corr buffer sizing (JNA) — num_pairs=3 · b6a986ac
      Andrey Filippov authored
      Second JNA-mode failure after the Phase-1 NPE fix: cudaErrorIllegalAddress at
      tp_utils.cu:142 (image upload), actually a DEFERRED fault from the inter-scene
      correlate2D_inter kernel writing out of bounds.
      
      Root cause: GpuQuadJna.ensureRbgCorr sized the native correlation buffers via
      Correlation2d.getNumPairs(num_cams). For the rectilinear single-camera config
      num_cams=1 -> getNumPairs(1)=0 -> tp_proc_setup_rbg_corr allocates zero-size
      gpu_corrs_td / gpu_corrs / gpu_corr_indices, so the inter-scene correlation
      wrote past them -> illegal address, surfacing (sticky) at the next CUDA call.
      
      Fix: mirror the JCuda oracle, whose rectilinear ctor hardcodes num_pairs=3
      (GpuQuad.java:732) for exactly the inter-scene case ->
        int num_pairs = rectilinear ? 3 : Correlation2d.getNumPairs(num_cams);
      
      Java-only; libtileproc.so untouched. mvn compile clean.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      b6a986ac
    • Andrey Filippov's avatar
      CLAUDE: JNA rectilinear single-array config port (Phase 1) · b65ee10d
      Andrey Filippov authored
      Fixes the whole bug-class behind the -Dtp.backend=jna NPE in GpuQuad.setLpfRbg
      (CuasRanging.detectTargets -> CuasMotion -> setRectilinearReferenceTD): the
      rectilinear single-camera GpuQuad was built via the raw JCuda ctor, bypassing
      the backend factory, so in JNA mode it got a null gpuTileProcessor.
      
      - GpuQuad.createRectilinear(): backend-aware factory parallel to create().
        JCUDA branch is byte-for-byte the legacy ctor (oracle path untouched); JNA
        branch builds a clean rectilinear GpuQuadJna. New no-alloc rectilinear ctor
        (num_cams=1, no kernels/geometry).
      - GpuQuadJna: rectilinear ctor + shared initNative(); the two overrides the
        gap-finder predicted -- reAllocateClt (no-op; native CLT pre-sized in setup)
        and singular setBayerImage (-> tp_proc_set_image). execConvertDirect already
        guarded on the rectilinear flag.
      - CuasMotion:452 routed through createRectilinear (CUAS rectilinear now
        JNA-capable).
      - ComboMatch:899 fail-loud UnsupportedOperationException in JNA mode
        (orthomosaic, wider unported surface, off the current path -- stays JCuda).
      
      Java-only; libtileproc.so untouched. mvn compile clean. JCuda legacy frozen as
      oracle; core convert_direct flag-soup cleanup deferred to Phase 2.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      b65ee10d
    • Andrey Filippov's avatar
      CLAUDE: gitignore ANDREY_CONTINUE.md (resume-now file) · 753d6900
      Andrey Filippov authored
      Root-level, Andrey-facing, single predictable "what to do right now" file the
      agent overwrites before session exit, so lost tmux scrollback never costs the
      restart plan. Local-only; gitignored alongside CLAUDE.md/AGENTS.md/MEMORY.md.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      753d6900
    • Andrey Filippov's avatar
    • Andrey Filippov's avatar
      CLAUDE: GpuQuadJna texture overrides (oracle): execTextures + readback · 747ff9d1
      Andrey Filippov authored
      Completes the oracle GPU surface. The reliable gap finder is
        comm -23 <(ImageDtt gpuQuad.* calls) <(GpuQuadJna overrides)
      not the gpuTrace dump (only ~14 methods are instrumented, so e.g.
      getFlatTextures was invisible in the trace though it is on the path).
      
      Overrides (delegating to the new tp_proc_* texture API):
      - execTextures: builds weights[3]/params[5], forwards calc_textures/calc_extra/
        linescan/dust/keep flags. Implements the production (USE_DS_DP) behavior.
      - getTextureIndices: reads kernel-built count + packed indices.
      - getExtra: reshapes diff_rgb_combo (texture_indices order) into
        [num_cams*(num_colors+1)][tilesX*tilesY] keyed by ntile -- identical to base.
      - getFlatTextures: de-pitches gpu_textures -- identical to base.
      
      TpJna.java: bindings for tp_proc_exec_textures/get_texture_indices/
      get_diff_rgb_combo/get_textures.
      
      Edits only -- not mvn-compiled (Eyesis run was live). Signatures match base
      @Override; referenced fields are public final / public static.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      747ff9d1
    • Andrey Filippov's avatar
      CLAUDE: gpuTrace dedup + GpuQuadJna oracle TD-corr readback overrides · 39b0fb90
      Andrey Filippov authored
      gpuTrace now prints each Class.method ONCE (was per-call -> spammy). Oracle JCuda trace showed it uses
      the TD-correlation readback path: getCorrIndices / getCorrTdData / getCorrComboIndices / eraseGpuCorrs
      (un-overridden -> would NPE in JNA). Override them via the new native tp_proc_get_corr_indices /
      get_corr_combo_indices / get_corr_td (DtoH) + tp_proc_erase_corrs. mvn -DskipTests compile clean.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      39b0fb90
    • Andrey Filippov's avatar
      CLAUDE: gpuTrace(-Dtp.trace=1) on uncertain GpuQuad methods — JCuda/JNA oracle comparison · 8b3dd30b
      Andrey Filippov authored
      Add GpuQuad.gpuTrace(m) printing "[GPUTRACE] "+getClass().getSimpleName()+"."+m (off unless
      -Dtp.trace=1). Instrument the un-overridden GPU methods (potential oracle gaps): getCltData,
      presentCltData, eraseGpuCorrs, execCorr2D (bundled), readbackTasks, setFullFrameImages, getCorrTdData,
      getCorrIndices, getCorrComboIndices, getExtra, getTextureIndices, getRBGA, execRBGA, execTextures.
      Since GpuQuadJna extends GpuQuad, the trace prints "GpuQuad.X" under JCuda and "GpuQuadJna.X" if a JNA
      run falls through to one (= coverage gap) -> reveals oracle's real GPU usage before any NPE.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      8b3dd30b
  3. 25 Jun, 2026 18 commits
  4. 21 Jun, 2026 6 commits
    • Andrey Filippov's avatar
      Added logging · 95e25fcc
      Andrey Filippov authored
      95e25fcc
    • Andrey Filippov's avatar
      CLAUDE: -OFFSET in px (s-first, full-frame meta, s-gated NaN); curt_dnn_thresh... · 571ef32e
      Andrey Filippov authored
      CLAUDE: -OFFSET in px (s-first, full-frame meta, s-gated NaN); curt_dnn_thresh viz-only (recurrent gets full field)
      
      -OFFSET (remote) reordered to {s,Vx,Vy,dx,dy} (s first -> ImageJ auto-ranges on it); Vx,Vy converted cells->px/level-frame (/vel_decimate); full-frame ROI written to the file metadata (self-describes its extent); Vx,Vy,dx,dy NaN'd where s<curt_dnn_thresh (s kept). curt_dnn_thresh is now VISUALIZATION-ONLY: the local inferROI feeds Layer 2 the FULL field (sThresh=0) so the recurrent gets the weak sub-threshold signal it integrates (no premature threshold = the LReLU lesson); dialog label/tooltip/decl updated to say viz-only, do-not-use-for-computation.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      571ef32e
    • Andrey Filippov's avatar
      CLAUDE: GenericJTabbedDialog - don't stretch checkboxes (fix accidental toggle) · 3c144146
      Andrey Filippov authored
      Checkboxes were laid out with fill=HORIZONTAL, stretching the (text-less) JCheckBox across the half-width cell so its click target reached the scrollbar - a near-scrollbar miss-click silently toggled them (this flipped 'DNN remote' off mid-session). Anchor checkboxes WEST at natural size; the rest of the cell is now dead space. Applies to all tabbed dialogs.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      3c144146
    • Andrey Filippov's avatar
      CLAUDE: Batch the DGX remote DNN path + on-GPU ghostbuster · 5bee1911
      Andrey Filippov authored
      runDnnRemote requests each level's scenes in chunks (REQ=64) via CuasDnnRemote.inferBatch instead of per-scene; the DGX runs them continuously (production-representative ~100ms/scene full-res) and applies the ghostbuster on the GPU in decode, so BOTH the ROI 121-cell field and the full-frame -OFFSET {dx,dy,s,Vx,Vy} are ghostbusted (dropped the Java-side ghostbust). Validated: local vs remote on the same weighted9_pm_s model -> max |diff| ~1e-4 (ORT per-pixel vs PyTorch shift-and-stitch fp). Full-res ~100ms/scene is the oracle; RT would use the 1/4-res single forward (~4.4ms).
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      5bee1911
    • Andrey Filippov's avatar
      CLAUDE: Port ghostbuster to the remote DNN path (match CPU -HYPER-RECT) · 01a6e535
      Andrey Filippov authored
      runDnnRemote() decoded the raw softmax*s field but skipped dnnGhostbust, so the untrained corner-velocity sidelobes (which the local CPU path zeros when curt_dnn_vmax>0) survived as background noise (~0.06 vs ~5e-4 on the MAX-all-v layer). Apply dnnGhostbust to the ROI field, mirroring the local path -> verified: ImageJ subtract of -HYPER-RECT MAX slices vs the CPU (17_UAS_REFACTORED) is exact zero. (Full-frame -OFFSET s is not yet ghostbusted - Java lacks the full 121-field; a DGX-side decode ghostbuster would cover that.)
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      01a6e535
    • Andrey Filippov's avatar
      CLAUDE: Add DGX remote DNN inference path (full-res shift-and-stitch) · 69753750
      Andrey Filippov authored
      detectTargets() can offload the DNN front-end to the GB10 DGX (curt_dnn_remote): CuasDnnRemote uploads the LoG-conditioned stack once, the DGX builds the pyramid and runs full-res shift-and-stitch per (level,scene), returning the full-frame {dx,dy,s,Vx,Vy} (-OFFSET) + ROI 121-cell softmax*s (-RECT/-HYPER-RECT). Auto-launches the server if down (bundled cuas_dnn/ scripts, or curt_dnn_remote_srcdir local-repo override - mirrors the GPU-kernel default-vs-override). Synthetic targets are mixed into the upload stack so synth works on the remote path. 4 curt_dnn_remote_* dialog params, grouped with the model fields. Local CPU path unchanged (curt_dnn_remote=false) for Layer 2. Validated end-to-end; shift-and-stitch is fp64-exact vs per-pixel.
      Co-Authored-By: 's avatarClaude Opus 4.8 (1M context) <noreply@anthropic.com>
      69753750
  5. 20 Jun, 2026 1 commit