# **TOF Aggregation tree**

A new TOF aggregation tree is needed in order to optimize the data throughput and memory use of the new FPGA. To understand the need to adapt the AFCK firmware a fast comparison between the boards is shown in table 1.

|                       | AFCK  | CRI    | Factor |
|-----------------------|-------|--------|--------|
| GBTx links            | 6     | 48-1   | x7,8   |
| GET4                  | 240   | 1880   |        |
| 36-bits Memory blocks | 445   | 2019   | x4,7   |
| Slices                | 50950 | 165840 | x3,25  |

table 1: AFCK vs CRI

From the table can be extracted that the number of GBTx links has been incremented by a factor of almost eight while the amount of memory and slices has been only incremented by a factor of 4. At first sight it seemed impossible to implement the design in the new FPGA because the memory blocks were completely used by the AFCK, and the slices used were more than 80 percent of the total. This conclusion is not true, because the more you fill the FPGA, the more slices you have to use to route successfully your design and achieve your timing constraints. In a newer FPGA produced with a better technology, the delay between slices should be smaller, and maybe the utilization of slices lower. In any case, this note will discuss how to use the memory blocks in order to use all GBTx links, and if the PCIe interface can handle the huge amount of data generated from the 1880 GET4 ASICs.

#### AFCK firmware overview

In order to merge and check the integrity of the GET4 data in the AFCK design, several FIFO stages were used. The advantages or disadvantages of this architecture will not be discussed, only the amount of used memory blocks will be mentioned. A diagram of the AFCK data path is shown in figure 1.



figure 1: AFCK GET4 data path

Per GET4 one 36-bit BRAM (Block ram) was used and 3 more per GBTx. Using this concept the CRI FPGA would need 2021 BRAMs only for the GET4s, and that is more than the amount available on the CRI board.

#### 1880 GET4 x 1BRAM + 47 GBTx x 3 BRAM = 2021 BRAMs

The proposed solution is shown in figure 2, at the first stage a dual FIFO is used to save the GET4 packages of one GBTx (40 GET4). The data of 3 GBTx are merged into the FIFO3 and the FLIM module will build the micro slices and will send them to the PCIe module. This schema will be replicated 8 times, in this way the data of 8 FLIMs modules per Super region will be sent through the PCIe interface.



figure 2: CRI data path

### Memory blocks

The FIFO1 buffers the GET4 data after a valid epoch, when the next epoch arrives the FIFO1 is ready to send the data into the FIFO3 and the GET4 data will be written into the FIFO2. This architecture allows you to reserve more BRAMs at the end of the data path, and in this way the end buffer can be used by the GBTxs that are actually sending data.

The first FIFO stage size (FIFO1/FIFO2) will be 2560 48-bit words in order to save a maximum of 64 packages between epoches for each GET4. Such a FIFO is implemented using five BRAMs (36-Kbit block RAMs). For all 24 GBTx in one logic region it will be needed 240 of the 1009 BRAMs from the one super region.

In this case the eight FIFO3 (one per FLIM) can use up to 96 BRAMs, that is 19,2 times the information of one epoch per GBTx, this FIFO will merge three GBTx, in this way it will be possible to buffer 6,4 Epoch data or 160 microseconds if the 120 GET4 is running with its maximal hit rate or almost 480 microseconds using the simulation peak value.

The presented solution can keep the recommended buffer time of 100 us needed due to the inhomogeneity of the beam.

## **Throughput**

The GET4 e-link is running at 80 Mbps and the GET4 sends 34-bits packages. Due to this, the GET4 can send up to 2,35 MPps (Packages per second). Theoretical one GBTx could transmit up to 94,11 MPps but according to the detector simulations the maximal Hit peaks will be 200 KHits per channel. If we use the simulation values with 4 channels per GET4 we will obtain 32 MHps (Mega Hits per second), this value is approximately the same as MPps because if no errors occur, only an extra time information package will be sent each 25,4 us.

It is important to mention that if the 3 GBTx sends data with maximal speed longer than 2 epochs (50 us), there will be a bottleneck at input of FIFO3 and data will be lost. If the throughput is calculated with the peak values, the 8 FLIM modules will produce 6,144 GBps of data, that is a 79% of the theoretical throughput of a PCIe 3.0 with 8 lanes interface.

# Merger procedure

TODO. Should describe the merger procedure from GET4 to FLIM. Sorting is not possible with the new structure!

## AFCK vs CRI

| AFCK                                                                              | CRI                                                                                                                                                             |
|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Xilinx Kintex 7 325T FFG900 FPGA 2 Mezzanine card (HPC) 7 Optical links (6x GBTx) | Xilinx Kintex UltraScale XCKU115 48 optical links (MiniPODs) (48x GBTx) TTC input ADN2814 PCle Gen3x16 (2x8 with switch) Si5345 jitter cleaner 1 Mezzanine card |

|                              |            | AFCK (XC7K325T)   | CRI (KU115)       | Factor |
|------------------------------|------------|-------------------|-------------------|--------|
| Logic Cells                  |            | 326.080           | 1.451.100         | x4,4   |
| Configurable<br>Logic Bolcks | Slices     | 50.950            | 165.840           | x3,2   |
|                              | FFs        | 407.600           | 1.326.720         | x3,25  |
|                              | LUTs       | 203.800           | 663.360           | x3,25  |
|                              | Distri RAM | 4.000 Kb          | 18.300 Kb         | x4,6   |
| DSP Slices                   |            | 840               | 5520              | x6,6   |
| Block RAM<br>Blocks          | 18 Kb      | 890               | 4218              | x4,7   |
|                              | 36 Kb      | 445               | 2019              | x4,7   |
|                              | Max Kb     | 16020             | 75924             | x4,7   |
| CMTs                         |            | 10 (1MMCM, 1 PLL) | 24 (1MMCM, 2 PLL) |        |
| PCle                         |            | 1                 | 6 (PCIe Gen3 x8)  | x6     |
| Gigabit Trans                |            | 16 (GTX 12.5Gbps) | 64 (GTH 16.3Gbps) | x4     |
| CLB                          |            | 8xLUTs + 16xFFs   | 8xLUTs + 16xFFs   |        |

# **GBTx Merger**