-
Andrey Filippov authored
Piece 2 of the native-JNA DNN path. tp_dnn.cpp is a C-ABI port of infer_server.py's hot path so the Java client can run L1+L2 in-process instead of over TCP: tpdnn_init/upload/infer/free (+ num_levels/level_frames) faithfully reproducing build_pyramid, the 16x shift-and-stitch full-res recovery, decode (ghostbuster + velocity centroid), and the L2 ConvGRU recurrence + track-age. Loads the TorchScript models from imagej_elphel_dnn (export_torchscript / export_l2_torchscript). Disables the TorchScript JIT fuser at init (nvrtc element-wise fusion fails on Blackwell; production wants no runtime nvrtc). Validated: native vs the running Python server (same CUDA) max|diff| offset5=0, roi=0 — bit-for-bit. (Oracle dump_ref.py + driver tpdnn_test.cpp, scratch.) Built standalone via build_dnn.sh (g++ + libtorch 2.7.1+cu128, ABI=1), separate from the nvcc-built libtileproc.so; fetch_libtorch.sh pulls the pinned libtorch. Context unification + zero-copy kernel<->tensor sharing is a later step. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9202dd30