jna/build_dnn.sh · 9202dd308d02f03817c136e262ddc08d45acc904 · Elphel / tile_processor_gpu

CLAUDE: native LibTorch L1+L2 inference shim (libtpdnn.so) for JNA · 9202dd30

Andrey Filippov authored Jun 27, 2026

Piece 2 of the native-JNA DNN path. tp_dnn.cpp is a C-ABI port of infer_server.py's
hot path so the Java client can run L1+L2 in-process instead of over TCP:
tpdnn_init/upload/infer/free (+ num_levels/level_frames)
faithfully reproducing build_pyramid, the 16x shift-and-stitch full-res recovery,
decode (ghostbuster + velocity centroid), and the L2 ConvGRU recurrence + track-age.
Loads the TorchScript models from imagej_elphel_dnn (export_torchscript /
export_l2_torchscript). Disables the TorchScript JIT fuser at init (nvrtc element-wise
fusion fails on Blackwell; production wants no runtime nvrtc).

Validated: native vs the running Python server (same CUDA) max|diff| offset5=0,
roi=0 — bit-for-bit. (Oracle dump_ref.py + driver tpdnn_test.cpp, scratch.)

Built standalone via build_dnn.sh (g++ + libtorch 2.7.1+cu128, ABI=1), separate
from the nvcc-built libtileproc.so; fetch_libtorch.sh pulls the pinned libtorch.
Context unification + zero-copy kernel<->tensor sharing is a later step.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

9202dd30

build_dnn.sh 1022 Bytes

Replace build_dnn.sh