CLAUDE: native LibTorch L1+L2 inference shim (libtpdnn.so) for JNA
Piece 2 of the native-JNA DNN path. tp_dnn.cpp is a C-ABI port of infer_server.py's
hot path so the Java client can run L1+L2 in-process instead of over TCP:
tpdnn_init/upload/infer/free (+ num_levels/level_frames)
faithfully reproducing build_pyramid, the 16x shift-and-stitch full-res recovery,
decode (ghostbuster + velocity centroid), and the L2 ConvGRU recurrence + track-age.
Loads the TorchScript models from imagej_elphel_dnn (export_torchscript /
export_l2_torchscript). Disables the TorchScript JIT fuser at init (nvrtc element-wise
fusion fails on Blackwell; production wants no runtime nvrtc).
Validated: native vs the running Python server (same CUDA) max|diff| offset5=0,
roi=0 — bit-for-bit. (Oracle dump_ref.py + driver tpdnn_test.cpp, scratch.)
Built standalone via build_dnn.sh (g++ + libtorch 2.7.1+cu128, ABI=1), separate
from the nvcc-built libtileproc.so; fetch_libtorch.sh pulls the pinned libtorch.
Context unification + zero-copy kernel<->tensor sharing is a later step.
Co-Authored-By:
Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Showing
jna/build_dnn.sh
0 → 100755
jna/fetch_libtorch.sh
0 → 100755
jna/tp_dnn.cpp
0 → 100644
This diff is collapsed.
Please register or sign in to comment