- 03 Jul, 2026 3 commits
-
-
Andrey Filippov authored
The virtual -CENTER INTERFRAME corr-xml only carries poses/velocities, so saving the recalculated 16+16 lwir offsets/scales there was futile. Follow the established photometric machinery (runPhotometric()/photoEach()) and the top-menu save/restore convention instead: set the new values on master_CLT (immediate use), quadCLTs[ref_index] (physical photometric owner, its <scene>-INTERFRAME.corr-xml is saved) and quadCLT_main (applied to next sequences and saved in the main configuration file). Co-Authored-By:Claude Fable 5 <noreply@anthropic.com>
-
Andrey Filippov authored
curt_cond_test rework: both PERSENSOR stacks now converted with the test's own uniform sensor-domain task grid (scale 1.0) instead of leftover GPU state (MB secondary tasks with negative fractional scales made 'raw' renders = -1/6 x input; leftover virtual-view grid lost the same border ROI on every sensor). perSensorFromRawJp4 no longer overwrites the scene's conditioned image_data. GpuQuadJna.setBayerImages(force,center) restored the base-class skip-guard via a native-side jna_bayer_set flag (gpuTileProcessor is null in JNA shell instances): every execConvertDirect unconditionally re-pulled quadCLT.getResetImageData(), silently clobbering explicit uploads - made the raw baseline bit-identical to the conditioned render. CuasMotion.perSensorLinearFit(): per-sensor a+b*x photometric fit over safe tiles (weak strength<0.5 or far disparity<1 from -INTER-INTRA-LMA, inner rect, 8x8 tile->pixel map) against the cross-sensor mean, gauge keep_averages (mean offset 0, mean scale 1), 3-sigma outlier rejection. Validated on 1773135476_186641: sensor-mean spread 1353->5 counts, cross-sensor RMS 358->17 (inliers), b in 0.83..1.11. CuasMotion.applyLwirLinearCalibration(): folds the fit into the 16+16 lwir offsets/scales (scale'=b*scale, offset'=offset-a/scale'), updates the center instance + photometric_scene provenance, saves -INTERFRAME.corr-xml. Applied the standard way at load they compensate the remaining per-sensor mismatch of the raw /jp4/ tiffs. Co-Authored-By:Claude Fable 5 <noreply@anthropic.com>
-
Andrey Filippov authored
Add make_hyper parameter (code-selected, not in settings): 0 - flat stack (old behavior, all pre-existing call sites); >0 - transpose to hyperstack [sensors][avgs+timestamps][pixel], z (top slider) - timestamps, t (bottom slider) - sensor channels; 2 - insert per-timestamp average of all used sensors as first channel (17th), computed from final conditioned slices. Per-sensor full + center-fraction averages now work in individual mode (pre-calculated merged-only average falls back to slice computation with a warning instead of AIOOBE). Number of average frames stays variable (0/1/2); fopen paths bit-identical by design. Verified on 495-scene CUAS sequence: INDIVIDUAL debug hyperstack matches the MERGED convention - to be used as oracle for RT conditioning. Co-Authored-By:Claude Fable 5 <noreply@anthropic.com>
-
- 02 Jul, 2026 2 commits
-
-
Andrey Filippov authored
-
Andrey Filippov authored
Add a curt_cond_test path (boolean at the top of the CUAS-RT dialog) that, inside the curt_en branch, renders the 16 per-sensor images and saves them to the -CENTER instance for calibration inspection: - CuasMotion.perSensorAveragesFromTD: imclt the per-sensor TD, print the 16-sensor average spread, save -CUAS-PERSENSOR (16-slice stack, per-slice avg labels). Saves via the -CENTER instance, not gpuQuad.getQuadCLT(). - CuasMotion.perSensorFromRawJp4: read RAW /jp4/ per-sensor (oracle getJp4Tiff, one thread/sensor), force-H2D (bypass the "GPU mem already correct" verify), execConvertDirect from raw, save -CUAS-PERSENSOR-RAW (uncorrected baseline; calibration stays a separate "cheat"). RT-seed for the future SATA raw stream; GPU port later with Java as oracle. Fix the NaN border on the RT SUBAVG-CONV2D product: - CuasDetectRT subtract-average -> NaN-tolerant union (average only non-NaN scenes per pixel), matching the oracle -CUAS-MERGED-CUAS; the plain sum NaN-propagated (one missing scene poisoned the pixel in every frame -> thick border after LoG). - CuasRTUtils.convolve2DLReLU -> NaN-aware (NaN out only if the center is NaN; substitute the center value for NaN taps), so the LoG can't bloom a thin border into a thick NaN frame. - Add -SUBAVG-PRELOG save (post-subtract-avg, pre-LoG) for bisecting. Compiles (mvn -DskipTests clean package). WIP: the raw-path values/edge and the in-memory-vs-file (MERGED-CUAS) divergence are still under review; the ~28px edge residual is traced to the temporal subtract-average at the rotation-swept composite edge. See ANDREY_CONTINUE.md open items. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
- 01 Jul, 2026 1 commit
-
-
Andrey Filippov authored
Piece 1 of the RT conditioning migration (design: internal handoff 2026-06-30_rt_conditioning_design.md): - cuas/rt/CuasConditioning: lean, self-contained per-sensor conditioning for the TP/RT path - Row/Col denoise (on/off, optional HPF of the 1-D avg profile) then photometric scales2*(raw-C0)^2 + scale*(raw-C0) - FPN (bit-matches the current additive path when scales2=0). Bypasses the heavy QuadCLT conditioning path. - CuasMotion.perSensorAveragesFromTD(GpuQuad, use_reference): memory-lean render of all 16 per-sensor from TD; per-sensor average + spread = calibration-quality gauge. Building blocks only; full test wiring (raw jp4 -> condition -> convert_direct -> renderSceneSequence per-sensor averages) + Eyesis invocation entry still pending. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
- 27 Jun, 2026 3 commits
-
-
Andrey Filippov authored
Adds an OFF-by-default profile that dependency:unpacks the libtorch native runtime (org.pytorch:libtorch-cxx11-cu128:2.7.1:zip, cu128) from mirror.elphel.com/maven-dependencies into target/libtorch-dist for the native DNN backend (libtpdnn.so / CuasDnnLocal) on a deployment box. The default build never downloads the 3.8GB zip. Artifact published to the mirror in maven layout (server-side copy of the existing zip) via tile_processor_gpu/jna/publish_libtorch_mirror.sh. Verified: zip + .pom reachable at the computed maven URL (HTTP 200, 3.78GB), profile parses (mvn -Plibtorch validate OK). Full unpack deferred (redundant on this box - libtorch already extracted); exercises on first deployment machine via `mvn -Plibtorch generate-resources`. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Bundles the exported TorchScript models + their .meta.json sidecars under src/main/resources/cuas_dnn/<name>/ so CuasDnnLocal runs with no local model dir (deployment needs no PyTorch/dev tooling - just the .so + libtorch runtime): weighted9_pm_s/model.ts.pt (+.meta.json) L1 (N=9,P=24,vr=5,out_ch=124) mexhat_gaps_boost40/model.l2.ts.pt (+.meta.json) L2 (ch_hidden=24,vmax=1.4) Validated: CuasDnnLocal bundled-resource path (curt_dnn_local_dir empty) extracts from the jar and matches the server oracle EXACTLY (offset5=0.0, roi=0.0). Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Piece 3 of the native-JNA DNN path. Adds a local backend that runs the SAME L1+L2 inference as CuasDnnRemote but in-process via LibTorch (libtpdnn.so / JNA), so the CUAS pipeline runs without the DGX or any Python server: - CuasDnnBackend : shared interface (upload/getNFrames/inferBatch->BatchResult/close) - TpDnnJna : JNA Library binding libtpdnn.so's C-ABI - CuasDnnLocal : wraps it; reads N/P/vr/l2_ch from each model's bundled .meta.json (single source of truth), float[][]<->float[], builds BatchResult - CuasDnnRemote : now implements CuasDnnBackend (signatures unchanged) - CuasDetectRT : DNN path gate now fires on (curt_dnn_remote || curt_dnn_local); backend = local? CuasDnnLocal : CuasDnnRemote; ensureServer skipped when local; local-CPU-ORT gate also excludes curt_dnn_local (no double-run). runDnnRemote loop unchanged. - IntersceneMatchParameters: curt_dnn_local (flag) + curt_dnn_local_dir (model dir override; empty = bundled /cuas_dnn resource) + GUI labels/persist. Validated: full Java->JNA->libtpdnn vs the Python-server oracle = EXACT (offset5=0.0, roi=0.0, nch=6). mvn -DskipTests package OK. Runtime: -Djna.library.path=<dir with libtpdnn.so>; libtpdnn.so finds libtorch via its rpath. Model resolution mirrors CuasDnnRemote's bundled-vs-override scheme. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
- 26 Jun, 2026 14 commits
-
-
Andrey Filippov authored
Migration validated (JNA CUAS targets match JCuda). Cleanup: - Removed all TEMP debug probes (-Dtp.dbg.corrpair, probeClt, saveTDRender, the one-shot DBG/PROBE blocks in GpuQuadJna + CuasMotion). Real fixes kept (rectilinear port, num_pairs=3, setCorrIndicesTdData, imclt ref_scene, num_corr_tiles propagation f6dcc90f). - Proactive sweep for the f6dcc90f bug-class (JNA override drops a base side-effect field write): getCorrComboIndices/getCorr2DCombo propagate num_corr_combo_tiles, setCorrIndicesTdData propagates num_corr_tiles, getTextureIndices propagates num_texture_tiles; those fields made protected. These four are LATENT (no live consumer on the validated CUAS path) and are marked NOT-YET-TESTED inline. Java-only. mvn compile clean. Co-Authored-By:
Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Root cause of the CORR2D-all-NaN / 0-targets: the inter-correlation actually works (probe showed num_corr_tiles=8850 = 4425 tiles x (1 sensor + 1 sum)), but the TD readback dropped it. Base GpuQuad.getCorrIndices() sets the num_corr_tiles field ("also sets num_corr_tiles"); GpuQuadJna.getCorrIndices() read the native count locally and returned the array WITHOUT setting the field. So TDCorrTile.getFromGpu (num_tiles = getNumCorrTiles()/num_pairs) and base getCorrTilesTd (uses the field directly) saw a stale 0 -> built 0 tiles -> empty target sequence -> null ROUND_ONE image -> saveImagePlusInModelDirectory NPE (the misplaced-null-guard latent bug is just the messenger). Fix: GpuQuadJna.getCorrIndices() sets num_corr_tiles = n (native count); field made protected so the subclass can. Java-only. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Post-mortem showed both CLT buffers loaded but inter-correlation -> 0 tiles. index_inter_correlate selects by __popc(sel_sensors); static reading says sel_sensors should be 1 (single-cam rectilinear), so a runtime value differs. - GpuQuadJna.execCorr2D_inter_TD: one-shot print sel_sensors/popc/num_cams/ num_colors/scales + the returned num_corr_tiles. - saveTDRender: makeArrays NPE'd on null titles (derefs titles[i]); pass a non-null titles[] so the render saves instead of crashing the run. TEMP — remove with the rest of the -Dtp.dbg.corrpair probe. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Single sacrificial run -> generous logging without spam: - GpuQuadJna: probe BOTH first ref convert (gpu_clt_ref) AND first scene convert (gpu_clt) — NaN%/nonzero/range each (probeClt helper). - CuasMotion.correlatePair one-shot: log targets_mv / tp_ref,tp_img counts / erase_cltr,erase_clt / fpixels null-ness, plus TD-correlation read-back stats (tile count + NaN% of TD values) alongside the DBG-REF/DBG-IMG renders. All gated/one-shot; no native change (reads via existing tp_proc_get_clt). TEMP — remove with the rest of the -Dtp.dbg.corrpair probe. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
(1) GpuQuadJna.execImcltRbgAll now passes ref_scene -> tp_proc_exec_imclt(use_ref) so renderFromTD(true) renders gpu_clt_ref, not gpu_clt (TpJna binding updated). (2) TEMP post-mortem in CuasMotion.correlatePair (gate -Dtp.dbg.corrpair=1, one-shot): after the inter-correlation, SAVE ref (gpu_clt_ref) + scene (gpu_clt) CLT renders to the model dir via saveImagePlusInModelDirectory (persist past the later crash; no window flood). DBG-REF blank/NaN => reference not loaded => explains the all-NaN CORR2D (inter corr needs both images). REMOVE after fix. Needs the tile_processor_gpu imclt use_ref commit + libtileproc.so rebuild. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Diagnostic for the CORR2D-all-NaN numeric divergence: after the first ref_scene=true convert, read back gpu_clt_ref (tp_proc_get_clt use_ref=1) and print NaN%/nonzero/min/max once. Confirms whether the reference convert populates gpu_clt_ref at all (vs scene gpu_clt which is correct -> SOURCE). Prints one "PROBE gpu_clt_ref[cam0]: ..." line to System.out (captured in the per-scene -SYSTEM_OUT.log). TEMP — remove after the divergence is fixed. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Third JNA-mode gap on the CUAS oracle path: TDCorrTile.convertTDtoPD re-uploads host-selected TD correlation tiles via GpuQuad.setCorrIndicesTdData, which was not overridden -> base JCuda cuMemcpyHtoD on a null gpu_corr_indices -> NPE. Adds the GpuQuadJna override (ensureRbgCorr() then delegate to the new native tp_proc_set_corr_indices_td) + the TpJna binding. Gap-finder over the full CUAS TD path (CuasMotion + TDCorrTile) confirms this was the LAST GPU-touching un-overridden method; the rest are pure config getters. Needs the matching tile_processor_gpu commit (native fn) + libtileproc.so rebuild. mvn compile clean. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Second JNA-mode failure after the Phase-1 NPE fix: cudaErrorIllegalAddress at tp_utils.cu:142 (image upload), actually a DEFERRED fault from the inter-scene correlate2D_inter kernel writing out of bounds. Root cause: GpuQuadJna.ensureRbgCorr sized the native correlation buffers via Correlation2d.getNumPairs(num_cams). For the rectilinear single-camera config num_cams=1 -> getNumPairs(1)=0 -> tp_proc_setup_rbg_corr allocates zero-size gpu_corrs_td / gpu_corrs / gpu_corr_indices, so the inter-scene correlation wrote past them -> illegal address, surfacing (sticky) at the next CUDA call. Fix: mirror the JCuda oracle, whose rectilinear ctor hardcodes num_pairs=3 (GpuQuad.java:732) for exactly the inter-scene case -> int num_pairs = rectilinear ? 3 : Correlation2d.getNumPairs(num_cams); Java-only; libtileproc.so untouched. mvn compile clean. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Fixes the whole bug-class behind the -Dtp.backend=jna NPE in GpuQuad.setLpfRbg (CuasRanging.detectTargets -> CuasMotion -> setRectilinearReferenceTD): the rectilinear single-camera GpuQuad was built via the raw JCuda ctor, bypassing the backend factory, so in JNA mode it got a null gpuTileProcessor. - GpuQuad.createRectilinear(): backend-aware factory parallel to create(). JCUDA branch is byte-for-byte the legacy ctor (oracle path untouched); JNA branch builds a clean rectilinear GpuQuadJna. New no-alloc rectilinear ctor (num_cams=1, no kernels/geometry). - GpuQuadJna: rectilinear ctor + shared initNative(); the two overrides the gap-finder predicted -- reAllocateClt (no-op; native CLT pre-sized in setup) and singular setBayerImage (-> tp_proc_set_image). execConvertDirect already guarded on the rectilinear flag. - CuasMotion:452 routed through createRectilinear (CUAS rectilinear now JNA-capable). - ComboMatch:899 fail-loud UnsupportedOperationException in JNA mode (orthomosaic, wider unported surface, off the current path -- stays JCuda). Java-only; libtileproc.so untouched. mvn compile clean. JCuda legacy frozen as oracle; core convert_direct flag-soup cleanup deferred to Phase 2. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Root-level, Andrey-facing, single predictable "what to do right now" file the agent overwrites before session exit, so lost tmux scrollback never costs the restart plan. Local-only; gitignored alongside CLAUDE.md/AGENTS.md/MEMORY.md. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
just debug
-
Andrey Filippov authored
Completes the oracle GPU surface. The reliable gap finder is comm -23 <(ImageDtt gpuQuad.* calls) <(GpuQuadJna overrides) not the gpuTrace dump (only ~14 methods are instrumented, so e.g. getFlatTextures was invisible in the trace though it is on the path). Overrides (delegating to the new tp_proc_* texture API): - execTextures: builds weights[3]/params[5], forwards calc_textures/calc_extra/ linescan/dust/keep flags. Implements the production (USE_DS_DP) behavior. - getTextureIndices: reads kernel-built count + packed indices. - getExtra: reshapes diff_rgb_combo (texture_indices order) into [num_cams*(num_colors+1)][tilesX*tilesY] keyed by ntile -- identical to base. - getFlatTextures: de-pitches gpu_textures -- identical to base. TpJna.java: bindings for tp_proc_exec_textures/get_texture_indices/ get_diff_rgb_combo/get_textures. Edits only -- not mvn-compiled (Eyesis run was live). Signatures match base @Override; referenced fields are public final / public static. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
gpuTrace now prints each Class.method ONCE (was per-call -> spammy). Oracle JCuda trace showed it uses the TD-correlation readback path: getCorrIndices / getCorrTdData / getCorrComboIndices / eraseGpuCorrs (un-overridden -> would NPE in JNA). Override them via the new native tp_proc_get_corr_indices / get_corr_combo_indices / get_corr_td (DtoH) + tp_proc_erase_corrs. mvn -DskipTests compile clean. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Add GpuQuad.gpuTrace(m) printing "[GPUTRACE] "+getClass().getSimpleName()+"."+m (off unless -Dtp.trace=1). Instrument the un-overridden GPU methods (potential oracle gaps): getCltData, presentCltData, eraseGpuCorrs, execCorr2D (bundled), readbackTasks, setFullFrameImages, getCorrTdData, getCorrIndices, getCorrComboIndices, getExtra, getTextureIndices, getRBGA, execRBGA, execTextures. Since GpuQuadJna extends GpuQuad, the trace prints "GpuQuad.X" under JCuda and "GpuQuadJna.X" if a JNA run falls through to one (= coverage gap) -> reveals oracle's real GPU usage before any NPE. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
- 25 Jun, 2026 17 commits
-
-
Andrey Filippov authored
updateTasks is called all over ImageDtt right after execSetTilesOffsets (reads gpu_ftasks back to rebuild TpTask[] with computed centerXY/disp_dist) -> tp_proc_get_tasks (DtoH). getWH returns full frame (base returns null gpu_clt_wh). Proactive (locating same-cause base-method derefs of null JCuda fields). Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
convertCenterClt -> setComboToTD -> setCltData pushes the restored center CLT to GPU; base getCltSize/ getNumTiles deref null gpu_clt_wh. Override to full-frame dims; setCltData -> new tp_proc_set_clt (HtoD per-cam slice, inverse of tp_proc_get_clt). Fixes the third JNA NPE (getCltSize:1211). Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
updateQuadCLT sets quadCLT + flag-only resetGeometryCorrection*(); skips the base's gpuTileProcessor.bayer_set clear (N/A natively - bayer re-uploaded each convert). resetBayer no-op. Fixes the second JNA NPE (updateQuadCLT:263). Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
setLpfRbg flattens the 4x64 r/b/g/m arrays -> "lpf_data"; setLpfCorr -> const_name (lpf_corr / lpf_rb_corr). Uploads to the native module's constant memory, matching JCUDA. mvn clean. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
GpuQuad.create(gpuTileProcessor, quadCLT, debug) returns the JCuda GpuQuad by default, or the native GpuQuadJna when -Dtp.backend=jna (srcdir/devrt overridable via -Dtp.jna.srcdir / -Dtp.jna.devrt). Routed all 32 main+aux `new GpuQuad(...)` 3-arg sites in Eyesis_Correction.java through the factory. JCUDA remains the default (behavior identical when the property is unset). mvn -DskipTests compile clean. Migration now fully implemented + compiling end-to-end (Step 1 native TpProc API, Step 2 GpuQuadJna full CUAS surface, Step 3 selector). Ready for the JCUDA-vs-JNA comparison + incremental troubleshooting. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Override execCorr2D_TD / execCorr2D_inter_TD / execCorr2D_combine / execCorr2D_normalize / getCorr2D / getCorr2DCombo delegating to the granular TpProc functions (setCorrMask kept for getNumUsedPairs; mono scale triplet 1/0/0; init|no_transpose<<1). handleWH -> full-frame no-op (TpProc fixed-size). GpuQuadJna now covers the full CUAS GPU surface (geometry/kernels/bayer/tasks/convert/imclt/getRBG/ correlations). mvn compile clean. fcorr_weights (per-tile) + setLpf* not yet plumbed — to surface in troubleshooting. Next: backend selector. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Override GpuQuad's GPU-touching methods for the image path, delegating to the native TpProc (with own caching, since base uses the null gpuTileProcessor): - setGeometryCorrection / setExtrinsicsVector -> tp_proc_set_geometry / set_correction_vector (gc.expandSensors(16).toFloatArray, cv.toFullRollArray). - setConvolutionKernels -> per-cam transpose-flatten (i=((i0&7)<<3)+((i0>>3)&7), CltExtra offsets) -> tp_proc_set_kernels / set_kernel_offsets. - setBayerImages -> channel-combine -> tp_proc_set_image (center -> set_center_image broadcast). - setTasks -> TpTask.asFloatArray -> tp_proc_set_tasks. - execSetTilesOffsets -> set gc+cv -> tp_proc_exec_geometry. - execConvertDirect(ref_scene,wh,erase_clt,no_kernels,use_center_image) -> tp_proc_exec_convert_direct (honors no_kernels skip-deconvolution + use_center_image, the fragile paths). - execImcltRbgAll -> tp_proc_exec_imclt; getRBG -> tp_proc_get_rbg + same inner-region extraction. mvn -DskipTests compile clean; all @Override signatures match base. Correlations (execCorr2D_*) and the backend selector are next. JCUDA remains the untouched default. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Architecture B (chosen after finding GpuQuad's surface is ~70 methods, too large for a clean interface): - GpuQuad: add a protected no-alloc constructor (QuadCLT, debug_level, native_backend marker) that sets only the final config fields (gpuTileProcessor=null) and allocates NO JCuda memory / context. The working JCuda constructors are untouched. - New GpuQuadJna extends GpuQuad: uses the no-alloc ctor, then stands up the native libtileproc.so via TpJna (tp_create_module + tp_proc_create + tp_proc_setup). Inherits all methods (so it compiles); GPU-touching methods will be overridden incrementally to delegate to TpProc, the rest throw to fail loudly off the validated path. close() frees native memory deterministically. mvn -DskipTests compile: clean. JCUDA remains the default/working path. Next: per-method override marshalling (kernels/bayer/geometry/tasks + convert/imclt/getRBG/corr), then the backend selector (QuadCLT ctor) and the live JCUDA-vs-JNA file comparison. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
TpJna: tp_proc_setup_rbg_corr/exec_imclt/get_rbg/exec_corr2d/get_corr2d_combo + extended tp_proc_convert_selftest. StageProc validates convert+imclt+corr through the persistent API (all match goldens) + no_kernels smoke. PASS on 5060 Ti. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
TpJna: tp_proc_create/setup/set_*/exec_*/get_clt/destroy + tp_proc_convert_selftest. StageProc: validates the persistent convert path == Stage-2 CLT golden + no_kernels smoke test. PASS on 5060 Ti. This is the production-facing surface GpuQuadJna (integration step 2) delegates to. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
CLAUDE: Stage 5 — JNA textures_nonoverlap binding + Stage5 driver (executes; golden mismatch documented) TpJna: tp_tex_selftest. Stage5: reports EXECUTED (Blackwell OK) + golden-match separately. textures_nonoverlap executes correctly on 5060 Ti; diff_rgb_combo golden mismatch is a documented known issue (not in the LWIR16 CUAS workflow). All kernels the CUAS workflow uses are validated. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
TpJna: add tp_corr_selftest. Stage4: run convert_direct + correlate2D/combine/normalize, report CLT error and the order-independent (sorted-distribution) correlation value error. PASS on 5060 Ti: sorted value error 2.06e-05 vs aux_corr-quad.corr (pointwise 0.66 is the stale golden's differing tile order, not a value error). Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
TpJna: add tp_imclt_selftest binding. Stage3: run convert_direct + imclt_rbg_all, report CLT and RBG max error. PASS on 5060 Ti: RBG relative ~1.31e-5 vs golden. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
TpJna: add tp_convert_direct_selftest binding. Stage2: invoke it against the tile_processor_gpu/clt golden, report num_active_tiles + max|CLT-golden|. PASS on 5060 Ti: 5120 active tiles, relative error ~8.85e-6 vs golden (first real kernel execution + CDP via the native shim, no JCuda). Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
TpJna: add the instance + geometry surface (tp_create_instance, tp_set_geometry_correction/correction_vector, tp_exec_calc_reverse_distortions, tp_exec_rot_derivs, tp_get_rbyrdist/rot_deriv, tp_destroy_instance). Stage1: drive the geometry path entirely across JNA (no JCuda) from the tile_processor_gpu/clt reference data (little-endian float32), then validate: rByRDist == clt/*.rbyrdist to ~1e-7 (GpuQuad.maxRbyRDistErr tolerance), rot_deriv rows orthogonal to ~1e-10. PASS for aux (16-cam) and main on 5060 Ti. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-
Andrey Filippov authored
Add JNA 5.14.0 dependency + com.elphel.imagej.gpu.jna (TpJna interface, Stage0 driver): load libtileproc.so, NVRTC-compile+CDP-link+load the kernels, 19/19 functions on the 5060 Ti from Java via JNA (no JCuda). First step of the GPU-layer migration; existing JCuda path untouched. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
-