-
Andrey Filippov authored
infer_server decode batched GPU_CHUNK scenes (p=[chunk,121,H,W]); 16 -> ~2.5GB -> OOM on the 16GB 5060 Ti (shared with Eyesis JCuda). Make GPU_CHUNK env-tunable (default 16 for big GPUs); run_infer_local.sh sets GPU_CHUNK=4 (+ expandable_segments) so local L1+L2 fits. By Claude 06/27/2026. Co-Authored-By:Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cd7b0801