docs: Update project-details.md with GPU Top-Level Dispatcher strategy

cf747e18 · Andrey Filippov · a53575f5 · cf747e18
Commit cf747e18 authored May 01, 2026 by Andrey Filippov
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 0 deletions

project-details.md docs/project-details.md +1 -0

No files found.
--- a/docs/project-details.md
+++ b/docs/project-details.md
@@ -567,6 +567,7 @@ This section captures the latest validated state before pausing Global LMA work
 3. **FPGA / Hardware Teaming Roadmap (U of U Collaboration):**
   - **MCP for GTKWave:** Develop a Model Context Protocol (MCP) bridge to allow LLMs to natively analyze `.vcd` files. This will enable natural language querying of simulation waveform data (e.g., "Find the memory arbiter hang").
   - **Cocotb Integration:** Revive the Python-based simulation-to-hardware workflow. The goal is to ensure that testbenches used in Icarus Verilog remain perfectly valid through physical hardware testing and eventual C-code kernel driver development.
+   - **GPU Top-Level Dispatcher (Human Latency Reduction):** Investigate moving the GPU pipeline orchestration from Java (JCuda sequential calls) into a single C++ "Master Dispatcher" kernel. By hollowing out the Java loops and decision logic and placing them into a single `.cu` file that calls mathematical modules as `__device__` functions, we eliminate the need to duplicate scheduling code across C++ and Java. This ensures that the production ImageJ environment uses the exact same orchestration logic as the development/Nsight environment, reducing human effort and convergence-translation errors.
   - **Agent-Assisted Onboarding:** Leverage agents to bridge the gap for "occasional" users (like graduate students) by guiding them through the specialized hardware/Verilog knowledge base.
 4. Batch replay of this quarter+global stage on previously processed data; classify failures and choose representative/challenging short test sequences.
 5. Algorithm improvement for occlusion handling: