# Progress on the online tracking algorithm

<u>Yutie Liang</u>, Hua Ye, Martin Galuska, Jifeng Hu, Wolfgang Kühn, Jens Sören Lange, David Münchow, Björn Spruck

II. Physikalisches Institut, JUSTUS-LIEBIG-UNIVERSITÄT GIESSEN Dec. 10 2013

## Outline

- 1. Introduction
- 2. Road finding and momentum calculation
- 3. Performance studies
- Single/multi-track events
- Dpm (event-based)
- Dpm (time-based)
- 4. VHDL implementation
- 5. Summary and outlook

## Straw Tube Tracker(STT)



 4 stereo double-layers for 3D reconstruction, with ±2.89 skew angle(blue/red)

From STT : Wire position + drift time

## Road finding



1: Sort hits, and fill into array array\_layer\_0 <= (1, 2) array\_layer\_1 <= (3, 4) array\_layer\_2 <= (5, 6)

2: Combine hits of two adjacent layers, keep effective combinations (in red color)

Layer 0 & 1: 1->3 || 2->3 || 1->4 || 2->4 Layer 1 & 2: 3->5 || 4->5 || 3->6 || 4->6 Easy to parallel design.

4: Calculate momentum for each tracklet.

3: Connect these combinations and form tracklets

 $1 ->4 + 4 ->5 \dots = 1 ->4 ->5 ->8 ->9$  $2 ->3 + 3 ->6 \dots = 2 ->3 ->6 ->7 \dots$ 

If somewhere broken, a further step to connect them...

#### Calculation of circle parameters



$$=(S_{xxxxdd} + S_{yyydd} + 2S_{xxydd} + 2aS_{xxxdd} + 2bS_{yyydd} + 2bS_{xxydd} + 2aS_{xyydd} + a^2S_{xxdd} + b^2S_{yydd} + 2abS_{xydd})/2r$$

Multi-track Single track, 0.2GeV y(cm) y(cm) -10 -20 -30 -30 -40 -20 -10 x(cm) x(cm)

## Performance study -- Dpm events



## Performance study -- Dpm events













- 1: Number of hits in the track
- 2:  $\chi^2$  of the track











1: Road finding: Using status machine to control the following procedures.

- 1) Hit sorting: fill hit into array\_layer\_id according to layer ID
- 2) Combine hits from two adjacent layers
- Form a tracklet by attaching hit layer by layer tracklet\_inner : layer 0-7 tracklet\_outer: layer 8-15
- 4) Combine tracklet\_inner and tracklet\_outer.

For one event with 100 hits (6 tracks): 1) 100 clock cycles (cc) 2) ~ 300-600cc 3) ~200cc  $\rightarrow$  several us (if FPGA running at 100MHz)

## **VHDL** implementation

#### 2: Momentum calculation



3:  $\chi^2$  calculation

24X24 bit not precise enough  $\rightarrow$  32X32 bit

## Setup and test

PC as data source and receiver.

- ➢ Ethernet.
- Optical link (UDP by Grzegorz Korcyl ) (not integrated yet)





## Simulation with ISim and test at FPGA

| ▶ ■ point_<br>point_<br>point_<br>atan_e | xc_delay_out[<br>yc_delay_out[<br>yc_to_xc_out[<br>out[0:15]<br>gh_out[0:15] | [Oce671,0 | 030236,<br>001da6,<br>000072, | 0ce671 03<br>006810 00<br>000470 00 | 30236,03<br>01da6,00<br>00072,00 | /8d5,f7904c,0<br>20de,f8b8f9,<br>01f57,0056f8,<br>0071,0005be<br>0c1f,007c21, | 000000,<br>000000,<br>,UUUUU | 000000<br>000000<br>U,UUUU               | 0000<br>0000<br>0000    | )            |                |
|------------------------------------------|------------------------------------------------------------------------------|-----------|-------------------------------|-------------------------------------|----------------------------------|-------------------------------------------------------------------------------|------------------------------|------------------------------------------|-------------------------|--------------|----------------|
| Track                                    | #1                                                                           | #2        | #3                            |                                     | #4                               | 000002f0<br>00000300<br>00000310<br>00000320                                  | 03 fe<br>27 85<br>27 5a      | 78 00 (<br>d5 00 (                       | 00 00<br>00 00<br>00 00 | 00 0<br>00 0 | 00<br>00<br>00 |
| Xr (cm):                                 | -38.8                                                                        | 63.9      | 62.4                          | 12                                  | 4.2                              | 00000330<br>00000340<br>00000350<br>00000360                                  | e6 02<br>71 36               | 03 00 (<br>20 be (<br>de 9c (<br>00 00 ( | 00 00<br>00 00          | 00 0<br>00 0 | 90<br>90       |
| Yr (cm):                                 | -158.7                                                                       | -129.9    | -127.                         | 8 10                                | 7.1                              | 00000370<br>00000380<br>00000390                                              | 68 1d<br>10 a6               | 1f 57 0<br>57 25 0<br>00 00 0            | 00 00<br>00 00          | 00 0<br>00 0 | 90<br>90       |
| R(cm):                                   | 163.3                                                                        | 144.7     | 142.2                         | 2 164                               | 4.0                              | 000003a0<br>000003b0<br>000003c0                                              | 70 72<br>00 00               | 00 00 0                                  | 00 00<br>00 00          | 00 0         | 90<br>90       |
| $\chi^2$ :                               | 0.40                                                                         | 0.63      | 0.58                          | <b>3</b> 0.                         | 74                               | 000003d0<br>000003e0<br>000003f0                                              | 5a 3d                        | 0c 7b (<br>1f 7b (<br>cc cc (            | 00 00                   | 00 0         | 90             |

Timing Expectation(4 tracks per event):  $200cc+300cc \rightarrow 300\sim500cc$  $\rightarrow 3\sim5$  us/event, agrees to the test with 1M events.

### Device utilization Summary

| Device Utilization Summary |        |           |             |         |  |  |  |  |
|----------------------------|--------|-----------|-------------|---------|--|--|--|--|
| Logic Utilization          | Used   | Available | Utilization | Note(s) |  |  |  |  |
| Number of Slice Flip Flops | 18,301 | 50,560    | 36%         |         |  |  |  |  |
| DCM autocalibration logic  | 14     | 18,301    | 1%          |         |  |  |  |  |
| Number of 4 input LUTs     | 22,934 | 50,560    | 45%         |         |  |  |  |  |
| DCM autocalibration logic  | 8      | 22,934    | 1%          |         |  |  |  |  |
| Number of occupied Slices  | 17,997 | 25,280    | 71%         |         |  |  |  |  |

#### .....

| Number of DSP48s                 | 124  | 128 | 96% |  |
|----------------------------------|------|-----|-----|--|
| Number of DCM_ADVs               | 2    | 12  | 16% |  |
| Average Fanout of Non-Clock Nets | 2.98 |     |     |  |

#### 31 multiplications take too much resource.

Multiplication(32 X 32 bit): 4 DSPs or 1088 LUTs

Need a smarter way to calculate  $\chi^2$ 

### Summary and Outlook

- $\succ$  In the road finding module, the match of inner and outer layer is done.
- > In momentum calculation module, the  $\chi^2$  is calculated.
- One more module is necessary to assign one recon. track to the correct event.

Next to do:

The road finding module is being optimized.

The module to calculate  $\chi^2$  need to be improved.

Thank you

## Device utilization Summary

| Det                                            | vice Utilization Sur | nmary     |             | Ŀ       |
|------------------------------------------------|----------------------|-----------|-------------|---------|
| Logic Utilization                              | Used                 | Available | Utilization | Note(s) |
| Number of Slice Flip Flops                     | 18,301               | 50,560    | 36%         |         |
| DCM autocalibration logic                      | 14                   | 18,301    | 1%          |         |
| Number of 4 input LUTs                         | 22,934               | 50,560    | 45%         |         |
| DCM autocalibration logic                      | 8                    | 22,934    | 1%          |         |
| Number of occupied Slices                      | 17,997               | 25,280    | 71%         |         |
| Number of Slices containing only related logic | 17,997               | 17,997    | 100%        |         |
| Number of Slices containing unrelated logic    | 0                    | 17,997    | 0%          |         |
| Total Number of 4 input LUTs                   | 23,331               | 50,560    | 46%         |         |
| Number used as logic                           | 19,002               |           |             |         |
| Number used as a route-thru                    | 397                  |           |             |         |
| Number used as 16x1 RAMs                       | 8                    |           |             |         |
| Number used for Dual Port RAMs                 | 2,448                |           |             |         |
| Number used as Shift registers                 | 1,476                |           |             |         |
| Number of bonded IOBs                          | 36                   | 576       | 6%          |         |
| IOB Flip Flops                                 | 3                    |           |             |         |
| IOB Dual-Data Rate Flops                       | 1                    |           |             |         |
| Number of BUFG/BUFGCTRLs                       | 6                    | 32        | 18%         |         |
| Number used as BUFGs                           | 5                    |           |             |         |
| Number used as BUFGCTRLs                       | 1                    |           |             |         |
| Number of FIFO16/RAMB16s                       | 102                  | 232       | 43%         |         |
| Number used as RAMB16s                         | 102                  |           |             |         |
| Number of DSP48s                               | 124                  | 128       | 96%         |         |
| Number of DCM_ADVs                             | 2                    | 12        | 16%         |         |
| Average Fanout of Non-Clock Nets               | 2.98                 |           |             |         |

| 00000400<br>* | 00 | 00 | 00 | 00 | 00          | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
|---------------|----|----|----|----|-------------|----|----|----|----|----|----|----|----|----|----|----|
| 00000490      | 00 | 01 | 02 | 03 | 04          | 05 | 06 | 07 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 000004a0      | 10 | 11 | 12 | 13 | <u>][</u> 4 | 15 | 16 | 17 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 000004b0      | lf | 20 | 21 | 00 | 00          | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 000004c0      | 00 | 00 | 00 | 00 | 00          | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |

PANDA@20MHz.  $2*10^7$  events/second 1 event: ~3 tracks/event \* ~16 hits/track \* ~2 (overlap factor)  $\rightarrow$  ~100 hits/event

When dividing STT into 16 layers, ~6 hits/layers. 6\*6\*15 = 540 combinations ~1~2 clock cycles/combination.  $\rightarrow 500$ ~1000 clock cycles/event

If FPGA running at 100MHz, (500~1000)\*10ns/ 50ns  $\rightarrow$  100~200 FPGA  $\rightarrow$  25~50 CN

#### Performance study – single track



## DPM background – Event # 1



#### To improve the momentum resolution



| (deleters)  |               |              |                             |
|-------------|---------------|--------------|-----------------------------|
|             | VHOL          | C++          | THE REPORT OF THE REPORT OF |
| Sm          | 4.26358       | 4.26378      |                             |
| Sxy         | 0,226081 V    | 1 1005650    |                             |
| гуу         | 010/2/23      | 0.012113/    |                             |
| Store       | 8124079 V     | 8,24134      | ovent 2:                    |
| Sxxy        | 0.438052      | 0143809      | det A 0.00109               |
| hux         | 0,0239513     | 0,0235331    | 0 0,00741                   |
| SYYY        | 0,00127371 ~  | 0.00127769   | 6: 001307                   |
| SKAKK       | 15.953 /      | 15.9542      |                             |
| SXXXY       | 0.0456499 ~   | 0:0458053    |                             |
| SAMA        | 0.000137139   | 0,01013764   |                             |
| a           | -0.00058454   | -0.000776344 | × 4 VHDL<br>-0.00273816     |
| 6           | -0100991509   | -000475687   |                             |
| dot A       | 0.000487605   | 0.000 530637 | - 0.019504                  |
| Ser (Sary-  | Spy) -1 57312 | -1.87337     |                             |
| Sxy (Synx-3 |               | +1.86861     |                             |
| 111         | 1             |              |                             |

2\*\*(-16) = 0.00001526