Commits · 7a351e9e3441ec023b2c7af5ba22b2f02cb6b9e8 · Elphel / imagej_elphel_dnn

27 Jun, 2026 1 commit

CLAUDE: export L2 (Layer2Net step) to TorchScript for native LibTorch · 7a351e9e

Andrey Filippov authored Jun 27, 2026

Piece 1 of the native-JNA DNN path (no Python server). Adds export_l2_torchscript.py:
wraps a trained Layer2Net's cell+head into a single-step L2Step module
forward(x,h)->(h_new,det,vel) — exactly infer_server's per-scene recurrence
(h=cell(x,h); det,vel=decode(h)) — so the C++ side just carries h and calls it
per scene. Size-agnostic (circular pad + 1x1 head), runs on the full field.

Validated: scripted==eager exact (0.0); C++ LibTorch (libtorch_probe/l2_probe)
loads it on Blackwell CUDA and replays the recurrence with hidden-state match
9.5e-7. Required disabling the TorchScript JIT fuser (nvrtc element-wise fusion
fails on Blackwell -arch; production wants no runtime nvrtc) — folds into the
native lib startup in piece 2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

7a351e9e

26 Jun, 2026 6 commits

CLAUDE: GPU_CHUNK env-tunable + GPU_CHUNK=4/expandable_segments for 16GB local · cd7b0801

Andrey Filippov authored Jun 26, 2026

infer_server decode batched GPU_CHUNK scenes (p=[chunk,121,H,W]); 16 -> ~2.5GB -> OOM on the 16GB 5060 Ti
(shared with Eyesis JCuda). Make GPU_CHUNK env-tunable (default 16 for big GPUs); run_infer_local.sh sets
GPU_CHUNK=4 (+ expandable_segments) so local L1+L2 fits. By Claude 06/27/2026.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cd7b0801

CLAUDE: default L2 = runs/mexhat_gaps_boost40 (current production L2, was l2_v1) · 609ba690
Andrey Filippov authored Jun 26, 2026
```
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
```
609ba690

CLAUDE: add run_infer_local.sh — local GPU infer_server (no DGX/docker/transfer) · 49975486

Andrey Filippov authored Jun 26, 2026

Run the PyTorch L1+L2 server on the workstation 5060 Ti; Java pipeline points at 127.0.0.1:5577.
Verified: server loads L1(weighted9_pm_s)+L2(l2_v1) on CUDA, warm-up + pyramid build OK.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

49975486

CLAUDE: broaden scope — general PyTorch DNN-companions repo (not CUAS-only) · 9262d943

Andrey Filippov authored Jun 26, 2026

Per Andrey: future DNN companions to imagej-elphel (all PyTorch, not Java) live here too. README + per-file
header tagline generalized accordingly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

9262d943

CLAUDE: add export_torchscript.py — TorchScript export for native LibTorch inference · 76dbfda2

Andrey Filippov authored Jun 26, 2026

Validated in PoC: L1 (weighted9_pm_s) -> TorchScript -> C++/CUDA on Blackwell matches PyTorch (7.6e-4).
Writes raw-f32 reference vectors for the native probe.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

76dbfda2

CLAUDE: initial import — DNN training/eval/export (migrated from imagej-elphel-internal/c5p_dnn) · 782ef529

Andrey Filippov authored Jun 26, 2026

L1 RawFCN + L2 ConvGRU(torus), synthetic data gen, training/eval, infer_server,
and export_torchscript.py (self-contained TorchScript for native LibTorch inference).
GPLv3 (Elphel norm); headers on all .py/.sh; LICENSE = GPLv3. runs/ checkpoints untracked.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

782ef529