Commit cf0298e1 authored by Andrey Filippov's avatar Andrey Filippov

docs: Update roadmap with Rong-Rong meeting and FPGA student timeline

parent cf747e18
...@@ -562,13 +562,13 @@ This section captures the latest validated state before pausing Global LMA work ...@@ -562,13 +562,13 @@ This section captures the latest validated state before pausing Global LMA work
- **Adaptive Integration Windows:** Implement a competitive selection mechanism (e.g., 1x vs 4x integration) where the pipeline automatically selects the motion vector that yields the highest Contrast-to-Noise Ratio (CNR), rather than relying on hardcoded thresholds. - **Adaptive Integration Windows:** Implement a competitive selection mechanism (e.g., 1x vs 4x integration) where the pipeline automatically selects the motion vector that yields the highest Contrast-to-Noise Ratio (CNR), rather than relying on hardcoded thresholds.
- **Fat Zero Auto-Scaling:** Automatically scale `cuas_rng_fz` (fat zero) down proportionally to the length of the temporal integration (longer integrations lower the stochastic noise floor, allowing closer-to-pure phase correlation). - **Fat Zero Auto-Scaling:** Automatically scale `cuas_rng_fz` (fat zero) down proportionally to the length of the temporal integration (longer integrations lower the stochastic noise floor, allowing closer-to-pure phase correlation).
- **Acceleration Compensation:** Refine the "Virtual Moving Camera" model to handle non-linear motion (like U-turns) that currently "squash" correlation peaks during long integrations. - **Acceleration Compensation:** Refine the "Virtual Moving Camera" model to handle non-linear motion (like U-turns) that currently "squash" correlation peaks during long integrations.
- **Hybrid Classical/DNN Tracking (U of U FOPEN Collaboration):** Export unnormalized MCLT correlation hyperstacks to train an Attention/SSM-based neural network. The goal is to replace the LMA fitter with a DNN regression head `(X, Y, vX, vY)` that applies learned "soft masks" to 16x16 macroblocks, running inference via TorchScript/DJL inside the Java pipeline. See `03_UU_RongRong_Hybrid_DNN_Architecture.md` for details. - **Hybrid Classical/DNN Tracking (U of U FOPEN Collaboration):** Export unnormalized MCLT correlation hyperstacks to train an Attention/SSM-based neural network. The goal is to replace the LMA fitter with a DNN regression head `(X, Y, vX, vY)` that applies learned "soft masks" to 16x16 macroblocks, running inference via TorchScript/DJL inside the Java pipeline. See `03_UU_RongRong_Hybrid_DNN_Architecture.md` for details. **Meeting scheduled with Rong-Rong team for Thursday, May 7th @ 2 PM at Elphel.**
- *Note: These deeper algorithmic optimizations are intentionally deferred. The strategy is to establish a working baseline first, expose the necessary low-bandwidth tile metrics via the MCP server, and then allow AI agents (Codex, Claude, Gemini) to autonomously sweep, analyze, and optimize these specific sub-problems.* - *Note: These deeper algorithmic optimizations are intentionally deferred. The strategy is to establish a working baseline first, expose the necessary low-bandwidth tile metrics via the MCP server, and then allow AI agents (Codex, Claude, Gemini) to autonomously sweep, analyze, and optimize these specific sub-problems.*
3. **FPGA / Hardware Teaming Roadmap (U of U Collaboration):** 3. **FPGA / Hardware Teaming Roadmap (U of U Collaboration):**
- **MCP for GTKWave:** Develop a Model Context Protocol (MCP) bridge to allow LLMs to natively analyze `.vcd` files. This will enable natural language querying of simulation waveform data (e.g., "Find the memory arbiter hang"). - **MCP for GTKWave:** Develop a Model Context Protocol (MCP) bridge to allow LLMs to natively analyze `.vcd` files. This will enable natural language querying of simulation waveform data (e.g., "Find the memory arbiter hang").
- **Cocotb Integration:** Revive the Python-based simulation-to-hardware workflow. The goal is to ensure that testbenches used in Icarus Verilog remain perfectly valid through physical hardware testing and eventual C-code kernel driver development. - **Cocotb Integration:** Revive the Python-based simulation-to-hardware workflow. The goal is to ensure that testbenches used in Icarus Verilog remain perfectly valid through physical hardware testing and eventual C-code kernel driver development.
- **GPU Top-Level Dispatcher (Human Latency Reduction):** Investigate moving the GPU pipeline orchestration from Java (JCuda sequential calls) into a single C++ "Master Dispatcher" kernel. By hollowing out the Java loops and decision logic and placing them into a single `.cu` file that calls mathematical modules as `__device__` functions, we eliminate the need to duplicate scheduling code across C++ and Java. This ensures that the production ImageJ environment uses the exact same orchestration logic as the development/Nsight environment, reducing human effort and convergence-translation errors. - **GPU Top-Level Dispatcher (Human Latency Reduction):** Investigate moving the GPU pipeline orchestration from Java (JCuda sequential calls) into a single C++ "Master Dispatcher" kernel. By hollowing out the Java loops and decision logic and placing them into a single `.cu` file that calls mathematical modules as `__device__` functions, we eliminate the need to duplicate scheduling code across C++ and Java. This ensures that the production ImageJ environment uses the exact same orchestration logic as the development/Nsight environment, reducing human effort and convergence-translation errors.
- **Agent-Assisted Onboarding:** Leverage agents to bridge the gap for "occasional" users (like graduate students) by guiding them through the specialized hardware/Verilog knowledge base. - **Agent-Assisted Onboarding:** Leverage agents to bridge the gap for "occasional" users (like graduate students) by guiding them through the specialized hardware/Verilog knowledge base. **Collaboration with U of U FPGA students starts after exams end in May.**
4. Batch replay of this quarter+global stage on previously processed data; classify failures and choose representative/challenging short test sequences. 4. Batch replay of this quarter+global stage on previously processed data; classify failures and choose representative/challenging short test sequences.
5. Algorithm improvement for occlusion handling: 5. Algorithm improvement for occlusion handling:
- Predict likely-occluded tiles from depth/disparity behavior. - Predict likely-occluded tiles from depth/disparity behavior.
......
...@@ -8618,6 +8618,13 @@ public class CuasMotion { ...@@ -8618,6 +8618,13 @@ public class CuasMotion {
lma_rslts[CuasMotionLMA.RSLT_FAIL] = CuasMotionLMA.FAIL_FAR; lma_rslts[CuasMotionLMA.RSLT_FAIL] = CuasMotionLMA.FAIL_FAR;
break try_failures; // below horizon line break try_failures; // below horizon line
} }
// see if it is completely outside
if ((Math.abs(x) >= (GPUTileProcessor.DTT_SIZE-1)) ||
(Math.abs(y) >= (GPUTileProcessor.DTT_SIZE-1))) {
lma_rslts[CuasMotionLMA.RSLT_FAIL] = CuasMotionLMA.FAIL_FAR;
break try_failures; // below horizon line
}
} }
failed = false; // all tests passed failed = false; // all tests passed
} }
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment