CLAUDE: native LibTorch L1+L2 inference shim (libtpdnn.so) for JNA

Piece 2 of the native-JNA DNN path. tp_dnn.cpp is a C-ABI port of infer_server.py's hot path so the Java client can run L1+L2 in-process instead of over TCP: tpdnn_init/upload/infer/free (+ num_levels/level_frames) faithfully reproducing build_pyramid, the 16x shift-and-stitch full-res recovery, decode (ghostbuster + velocity centroid), and the L2 ConvGRU recurrence + track-age. Loads the TorchScript models from imagej_elphel_dnn (export_torchscript / export_l2_torchscript). Disables the TorchScript JIT fuser at init (nvrtc element-wise fusion fails on Blackwell; production wants no runtime nvrtc). Validated: native vs the running Python server (same CUDA) max|diff| offset5=0, roi=0 — bit-for-bit. (Oracle dump_ref.py + driver tpdnn_test.cpp, scratch.) Built standalone via build_dnn.sh (g++ + libtorch 2.7.1+cu128, ABI=1), separate from the nvcc-built libtileproc.so; fetch_libtorch.sh pulls the pinned libtorch. Context unification + zero-copy kernel<->tensor sharing is a later step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CLAUDE: native LibTorch L1+L2 inference shim (libtpdnn.so) for JNA
Piece 2 of the native-JNA DNN path. tp_dnn.cpp is a C-ABI port of infer_server.py's hot path so the Java client can run L1+L2 in-process instead of over TCP: tpdnn_init/upload/infer/free (+ num_levels/level_frames) faithfully reproducing build_pyramid, the 16x shift-and-stitch full-res recovery, decode (ghostbuster + velocity centroid), and the L2 ConvGRU recurrence + track-age. Loads the TorchScript models from imagej_elphel_dnn (export_torchscript / export_l2_torchscript). Disables the TorchScript JIT fuser at init (nvrtc element-wise fusion fails on Blackwell; production wants no runtime nvrtc). Validated: native vs the running Python server (same CUDA) max|diff| offset5=0, roi=0 — bit-for-bit. (Oracle dump_ref.py + driver tpdnn_test.cpp, scratch.) Built standalone via build_dnn.sh (g++ + libtorch 2.7.1+cu128, ABI=1), separate from the nvcc-built libtileproc.so; fetch_libtorch.sh pulls the pinned libtorch. Context unification + zero-copy kernel<->tensor sharing is a later step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9202dd30 · Andrey Filippov · 7540202f · 9202dd30 · 9202dd30 · 9202dd30
Commit 9202dd30 authored Jun 27, 2026 by Andrey Filippov
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 254 additions and 0 deletions

.gitignore jna/.gitignore +1 -0

build_dnn.sh jna/build_dnn.sh +20 -0

fetch_libtorch.sh jna/fetch_libtorch.sh +21 -0

tp_dnn.cpp jna/tp_dnn.cpp +212 -0

No files found.
--- a/jna/.gitignore
+++ b/jna/.gitignore
 libtileproc.so
+libtpdnn.so
 tp_nvrtc_probe
 *.o
--- a/jna/build_dnn.sh
+++ b/jna/build_dnn.sh
+#!/usr/bin/env bash
+# Build libtpdnn.so — native LibTorch L1+L2 inference shim for JNA (no Python server).
+# Separate from libtileproc.so (nvcc kernels): this links libtorch (g++). The two .so's unify
+# their CUDA context later (zero-copy kernel<->tensor). By Claude on 2026-06-26.
+#
+# Requires libtorch 2.7.1+cu128 (matches the TorchScript export torch version). Set LIBTORCH to
+# its root (default /home/elphel/git/libtorch). Run jna/fetch_libtorch.sh to obtain it.
+set -e
+cd "$(dirname "$0")"
+LIBTORCH="${LIBTORCH:-/home/elphel/git/libtorch}"
+[ -d "$LIBTORCH/include/torch" ] || { echo "libtorch not found at $LIBTORCH (set LIBTORCH= or run fetch_libtorch.sh)"; exit 1; }
+
+g++ -std=gnu++17 -O3 -DNDEBUG -fPIC --shared \
+    -D_GLIBCXX_USE_CXX11_ABI=1 \
+    -I"$LIBTORCH/include" -I"$LIBTORCH/include/torch/csrc/api/include" \
+    tp_dnn.cpp \
+    -o libtpdnn.so \
+    -L"$LIBTORCH/lib" -Wl,-rpath,"$LIBTORCH/lib" \
+    -ltorch -ltorch_cpu -ltorch_cuda -lc10 -lc10_cuda
+echo "built ./libtpdnn.so (LIBTORCH=$LIBTORCH)"
--- a/jna/fetch_libtorch.sh
+++ b/jna/fetch_libtorch.sh
+#!/bin/bash
+# Fetch + extract the pinned libtorch (cu128 / CUDA 12.8, Blackwell sm_120) from mirror.elphel.com.
+# Runtime dependency for native DNN inference (L1/L2 via TorchScript). NOT in git (~3.8 GB zip / ~GB extracted).
+# Default extract location: /home/elphel/git/libtorch  (native build uses -DCMAKE_PREFIX_PATH=<that>).
+# By Claude on 06/27/2026.
+set -euo pipefail
+LT_ZIP="libtorch-cxx11-abi-shared-with-deps-2.7.1-cu128.zip"
+LT_URL="https://mirror.elphel.com/libtorch/${LT_ZIP}"
+PARENT="${1:-/home/elphel/git}"          # libtorch extracts to $PARENT/libtorch
+DEST="$PARENT/libtorch"
+if [ -f "$DEST/build-version" ]; then
+  echo "libtorch already present: $DEST ($(cat "$DEST/build-version"))"; exit 0
+fi
+mkdir -p "$PARENT"; cd "$PARENT"
+echo "Downloading $LT_URL ..."
+# NOTE: mirror.elphel.com WAF returns 406 to curl's default UA -> use a browser UA.
+curl -fSL -A "Mozilla/5.0 (X11; Linux x86_64)" "$LT_URL" -o "$LT_ZIP"
+echo "Extracting -> $DEST ..."
+unzip -q -o "$LT_ZIP"                     # extracts top-level ./libtorch/
+rm -f "$LT_ZIP"
+echo "libtorch ready: $DEST ($(cat "$DEST/build-version"))"
--- a/jna/tp_dnn.cpp
+++ b/jna/tp_dnn.cpp