mcntrl_clocks_delays.md 6.94 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
#Clocks and programmable delays in the DDR3 memory interface of Elphel NC393 camera

Block diagram of the memory interface is available here:
[http://blog.elphel.com/wp-content/uploads/2014/06/eddr3_bdiag.png]
## Used PLL and clock buffer resources
DDR3 controller uses one MMCME2\_ADV and one PLLE2\_ADV primitives. PLLE2\_ADV drives only __clk_ref__,
all other clocks described below (with the exception of the __byte\_lane0\_i/iclk__ and __byte\_lane1\_i/iclk__,
generated from the I/O ports) are driven by the MMCME2\_ADV.

All referenced parameters are defined in __x393\_parameters.vh__

Controller utilizes the following clock buffers (frequencies are shown for the current implementation),
Verilog keywords _posedge_ and _negedge_ refer to the rising and falling clock edges:
  * 2 of BUFG:
    * __mclk__ (200 MHz) used in the high-level parts of the memory controller, for request arbitration logic,
      organization of the  multi-channel client memory access as a sequence of the transactions, synchronizes
      fills/reads the channel buffers from the memory side. Most events are synchronized by the rising clock
      edge, but the data from the memory to buffers is synchronized by the falling edge of this clock. This clock
      has a static phase shift (defined by the *parameter MCLK\_PHASE=90*) to other clocks, such as __clk\_div__
      (below) used in ISERDES and OSERDES I/O serializers/deserializers, so when crossing clock boundary
      __mclk__ -> __clk\_div__ (from posedge __mclk__ to posedge __clk\_div__) there is 0.75 of the clock period
      (3.75 ns), and the the same time is available when crossing back: __clk\_div__ -> __mclk__
      (from posedge __clk\_div__ to negedge __mclk__)
    * __clk_ref__ is used only for I/O delay modules (connected to the IDELAYCTRL) and is now 200MHz, will
      likely be 300MHz in the future.
  * 1 BUFIO clock buffer:
    * __sdclk__ (400MHz) with OBUFTDS drives DDR3 memory differential clock signal. Its source comes from the
      only MMCME2\_ADV output that is not driven by the dynamic phase shifter, so effectively it is as if it is
      only one controlled by it. This is done so to avoid dependence on the PLL in the external DDR3 chip.  
      With the current settings of the MCME2\_ADV (VCO frequency = 800MHz) this phase adjust step is 1/56 of
      1/Fvco ~= 22ps and dynamic phase shift can be adjusted in the +/-127 counts range, or more than 1 full
      period of the __sdclk__ in each direction (full __sdclk__ period corresponds to 112=0x70 phase steps).
      When the phase shift is increased, DDR3 clock arrives earlier to the memory chip.
  * 4 of the regional clock buffers (BUFR):
    * __clk__ - (400MHz) drives I/O serializers and deserializers, with the rising edge aligned with the 
      __clk\_div__. This clock has statically defined phase by the *parameter CLK\_PHASE=0*
    * __clk\_div__ (200 Mhz) also drives I/O serializers and deserializers, synchronizing their parallel
      side (interfacing to the rest of the system). Static phase is specified in *parameter CLK\_DIV\_PHASE=0*
    * __byte\_lane0\_i/iclk__ and
    * __byte\_lane0\_i/iclk__ are parts of the two byte\_lane modules and provide clocks derived from the
      differential DQSL and DQSU ports. Normally these clocks are used for the memory read operations when DQSL
      and DQSU are generated by the memory device, but in the write levelling mode DQSL and DQSU are generated
      by the FPGA and is fed back with these clocks to drive input deserializers. These two clocks are gated,
      and in memory provides just enough of the pulses to push data through the ISERDES modules.
      
##Available programmable delays
Clocks __mclk__, __clk__ and __clk\_div have statically defined relative phases (__clk__ and __clk\_div are
posedge-aligned and __mclk__ is 90 degrees later). External memory device differential clock can be adjusted to
any phase with 360/112 calibrated phase shift step. All other available programmable delays are based on
IDELAY2_FINEDELAY and ODELAY2_FINEDELAY primitives with delays consisting of 2 parts:
  * 31-tap delay with calibrated 78ps/tap resolution for 200Hz reference clock (currently used) and shorter
    52ps/tap delays when using 300Mhz. This provides the full range of 2.4ns - slightly less than the full
    period of 400Mhz clock
  * 5-tap uncalibrated delay of approximately 10ps/tap connected in series with the main 31-tap one. When 
    using 200MHz this stage covers approximately 1/2 of the step of the 31-tap delays, effectively adding
    just one extra bit of the resolution. 300 MHz (available in the faster Zynq devices than the one used
    in the prototype of the NC393 camera) will allow more uniform subdivision of the delays.
All delays in this controller use 8-bit delay value, with 5 MSBs controlling the 31-tap delay and the 3 LSB
controlling the fine delay. Only values of 0, 1,2, 3 and 4 are valid for the 3 LSB of each delay.

There are 18 programmable input delays and 43 output delays, each individually controlled:
  * Two of the __DQS\_IDELAY__ values control delay from the I/O ports to the __byte\_lane0\_i/iclk__ and
	__byte\_lane1\_i/iclk__ clocks that drive input deserializers. differences between individual
	__DQS\_IDELAY__ values and  __DQS\_IDELAY__ should get in the center of the "eyes" for reliable reading
	data, but the absolute value of the __DQS\_IDELAY__ values is important for crossing boundaries from the
	__byte\_lane*\_i/iclk__ clocks to the __clk__ and __clk\_div__ ones. Input clocks are generated by
	the memory device with its PLL driven by the __sdclk__ (indirectly dependent on the programmable phase
	shift), and the DQS phase may fluctuate relative to the FPGA clocks. During
  * 16 __DQ\_IDELAY__ values set the input delays of the individual data bit signals. Data acquisition
	  windows are determined by the __DQ\_IDELAY__- __DQS\_IDELAY__ differences while __DQS\_IDELAY__ have
	  to satisfy other requirements too.
  * Two __DQS\_ODELAY__ determine delay of the QDS signals. These values can be determined with the "write
    levellimg" procedure - when the memory device sees DQS* signal lagging behind the clock, it outputs
    '8h1 on each of the data bytes, when it is too early - outputs '8h0
  * 16 __DQ\_ODELAY__ values set the output delays for the individual data bit signals to the memory. Data
      sent to the memory should be centered around DQS transitions, so __DQ\_ODELAY__ values should be
      approximately 90 degrees (~0x40 when other signals delays in teh FPGA are equal) higher than
      __DQS\_ODELAY__ . If the available ranges do not allow that, __DQS\_ODELAY__ can be modified together
      with the clock phase (verifying the __DQS\_IDELAY__ requirements above).
  * 2 of the __DM\_ODELAY__ values control data mask signals, they have the same timing requirements as    
    __DQ\_IDELAY__ ones.
  * 23 ___DLY\_CMDA\_ODELAY__ values control command, address, bank address and ODT outputs to DDR3. Their
    timing requirements are more relaxed as they operate in SDR (not DDR) mode.