# A 20 GHz 8-bit All-N-Transistor Logic CLA Using 16-nm FinFET Technology

Tzung-Je Lee, Department of Electrical Engineering National Sun Yat-Sen University, Kaohsiung, Taiwan 80424 Email: tjlee@ee.nsysu.edu.tw

Wen-Shou Yang Department of Electrical Engineering National Sun Yat-Sen University, Kaohsiung, Taiwan 80424 Email: yangbs1023@yahoo.com.tw

Chua-Chin Wang† , Department of Electrical Engineering National Sun Yat-Sen University, Kaohsiung, Taiwan 80424 Email: ccwang@ee.nsysu.edu.tw

*Abstract*—This paper presents a 20 GHz 8-bit carry-lookahead adder (CLA) using all-N-transistor (ANT) logic. By using the proposed ANT logic, an auxiliary current path through NMOS transistor is provided such that the speed limitation caused by PMOS is avoided. Besides, the FinFET device is used to improve the speed with the enhanced mobility. Moreover, the analysis of the delay time for the critical path of the 8-bit CLA is also carried out to improve the PDP (Power-Delay Product) by considering the parasitic R-C in FinFET devices. The proposed design is implemented with a typical 16 nm FinFET process. The core area is  $206.403 \times 152.506 \ \mu m^2$ . Based on post-layout simulations, the delay time is 931 ps at a 20 pF capacitive load. The simulated PDP is only 21.67 pW at 20 GHz clock rate.

Keywords— dynamic logic, CLA, All-N-transistor, PDP, FinFET CMOS

### I. INTRODUCTION

High speed and low power consumption are important design driving force for digital arithmetic processing circuits. Traditionally, CMOS process is preferred because of the low power consumption due to its low static current [1]- [4]. Recently, CNT-FET (Carbon Nanotube Field-Effect Transistor) [5]- [6] and FinFET (Fin Field-Effect Transistor) [7]- [8] offer another choices for high speed applications because of their higher carrier mobility and lower device area. To reduce the power consumption, the pass-transistor logic [1], and the XOR/XNOR logic [2]- [4] are widely used. However, the operation speed is sacrificed due to the increased equivalent resistance of the series transistors. On the other hand, the dynamic logic circuits are considered as one of the solutions for the GHz applications [9]- [12]. In traditional Domino logic, only non-inverting logic operation is provided [13]. Thus, the NP logic [13] and the complementary all-N-transistor (ANT) logic [10] were reported to provide both the inverting and noninverting logic operations. However, the operating speed would be limited because of the current path through the slow PMOS devices. Thus, the all-N-logic (ANL) [9] circuit was proposed to provide another assistant driving current path through the NMOS transistor to increase the operation speed. However, the speed is reduced when number of the series transistors is increased.

† Prof. C.-C. Wang is the contact author. (e-mail: ccwang@ee.nsysu.edu.tw)

To overcome all the mentioned problems, this paper demonstrates the 8-bit CLA (Carry-Lookahead Adder) using the ANT logic carried out with 16-nm FinFET process. Based on the post-layout simulation results, the operating frequency is 20 GHz with 20 pF load given. Moreover, the rise time, fall time, and delay time are 562 ps, 464 ps, and 931 ps, respectively.



Fig. 1. FinFET features: (a) The 3D structure, (b) the top view, (c) the crosssection view, and (d) the parasitic resistors and capacitors.

## II. CLA REALIZED BY ANT LOGIC USING FINFET

To enhance the operation frequency, the FinFET device is used in this design. According to the 3D structure of FinFET, as shown in Fig. 1 (a) and (b), the drain current in the saturation region could be expressed as follows [14]- [15].

$$
I_D = \mu_n C_{ox} (\frac{W_{eff}}{L}) (V_{GS} - V_{TH})^2
$$
 (1)

$$
W_{eff} = N_F \cdot (2H_F + W_F), \tag{2}
$$

where  $W_{eff}$  denotes the effective width of the FinFET, and  $N_F$  is the number of the fins. Referring to Fig. 1 (b) and (c), the parasitic gate-drain and gate-source capacitors are composed of various overlap and fringe capacitors. These parasitic capacitors could be modeled as five parasitic capacitors, as shown in Fig. 1 (d). The gate-drain capacitor is expressed as the function of the fin numbers in Eqn. (3).

$$
C_{gd} = N_F \cdot (C'_{gd,ov} + C'_{gd,fr})
$$
\n<sup>(3)</sup>



Fig. 2. Proposed 8-bit CLA, (a) the block diagram; (b) the schematic of the ANT logic; (c) the illustrated waveforms.

where  $C'_{gd,ov}$  and  $C'_{gd,fr}$  refer to the joint overlap and fringe unity capacitance, respectively. The equivalent switching resistance of transistor in the digital application could be approximated to the reciprocal slope of the line from  $Vds = VDD$  to  $Vds = 0$  in the I-V curve [13].

$$
R_n = \frac{VDD}{\frac{1}{2}\mu_n C_{ox}(\frac{W_{eff}}{L})(VDD - V_{thn})^2} = R'_n \cdot \frac{L}{W_{eff}} \tag{4}
$$

where  $R'_n$  refers to the effective resistance.

Fig. 2 (a) shows the block diagram of the proposed 8-bit CLA. The input signals, A0  $\sim$  A7, and B0  $\sim$  B7, are received by the two sets of 8-bit TSPC (True Single-Phase Clock) DFFs for synchronization. The synchronized signals are then coupled to the 8-bit CLA G/P generator block.

In order to achieve the requirement of the low power-delay product (PDP) and avoid the addditional 3 times area penalty caused by the PMOS transistors, the ANT logic is utilized. Referring to Fig. 2 (b), the schematic of the ANT logic is shown. When clk is logic 0, the circuit is in the pre-charge phase. In this phase,  $V_A$  is charged to Vdd, such that P2 is turned off. Besides, N1 and N4 are turned off. Therefore, the output is the same as the previous state. When clk is logic 1, the circuit is in the evaluate phase. The logic value of the output is determined by the operation of the N-block, and is expressed as  $Y = f(X_1, X_2, \dots, X_n)$ . In the evaulate phase, the ANT logic works in 4 different cases according to the operation of the N-block and the previous state of the output,  $V_{Y,pre}$ .

**Case 1:** When the N-block is on and  $V_{Y,pre}$  = Vdd, the ANT logic works in the case 1. Firstly,  $V_A$  is discharged from Vdd to Vdd -  $|V_{thp}|$  through the N-block and the weak operation of N3, because  $V_B$  is at 0 V immediately when clk changes to Vdd and  $V_C$  is clamped at 0 V initially by the pre-charge phase.  $V_Y$  is pulled a little lower by N4 in this step. When  $V_A$  $\langle \nabla \cdot \n$ and the loop of P3 and N3. Therefore,  $V_Y$  is then pulled back to Vdd by P2 and N4. The illustraded waveforms are shown in Fig. 2 (c). The delay in this case could be expressed by the following equation.

$$
\tau_{11} = k_1 \cdot R'_{n4} \cdot \frac{L}{W_{eff}} \cdot C_A \tag{5}
$$

$$
\tau_{12} = k_2 \cdot (R'_{p2}||(R'_{n4} + R'_{p3})) \cdot \frac{L}{W_{eff}} \cdot C_Y \tag{6}
$$

where the parameters,  $k_1 = \frac{|V_{thp}|}{Vdd}$  and  $k_2 = \frac{Vdd-|V_{thp}|}{Vdd}$ , refer to the ratio of the duration for the two-step operation, respectively.  $C_A$  and  $C_Y$  include parasitic capacitance at node A and node Y, respectively.

**Case 2:** When the N-block is on and  $V_{Y,pre} = 0$  V, the function of the ANT logic is in the case 2. Similarly,  $V_A$  is discharged from Vdd to Vdd -  $|V_{thn}|$  in the first step by the N-block. When  $V_A$  becomes lower than Vdd -  $|V_{thp}|$ ,  $V_A$  is pulled down quickly by the N-block and the loop of P3 and N3. Then,  $V_Y$  is then charged to Vdd by P2 and N4. The delay in the first step,  $\tau_{21}$ , is

$$
\tau_{21} = k_1 \cdot (R'_{N-block}) \cdot \frac{L}{W_{eff}} \cdot C_A \tag{7}
$$

The delay of the second step,  $\tau_{22}$ , is the same as Eqn. (6), thus,  $\tau_{22} = \tau_{12}$ .

**Case 3:** When the N-block is off and  $V_{Y,pre}$  = Vdd,  $V_Y$ is discharged from Vdd to 0 V by N4, because  $V_C$  is 0 V initially. The delay for case 3 is derived as

$$
\tau_3 = (R'_{N4}) \cdot \frac{L}{W_{eff}} \cdot C_Y. \tag{8}
$$

**Case 4:** When the N-block is off and  $V_{Y,pre} = 0$  V,  $V_Y$ is kept at 0 V without any transition required. Fig. 3 shows the schematic of the Generation and Propogation circuit with ANT logic, where i is 0∼7 referring to the 8 stages in parallel.

Fig. 4 shows the schematic of the 8-bit carry generator circuit, which is composed of two cascaded stages of the inverters and the ANT logic circuits. The ANT blocks in gray background color refer to the ANT logic shown in Fig. 2 (b), while the remaining NMOS transistors become the N-block for each ANT logic circuit. It generates the output signals with the boolean operation as follows.

$$
C_i = G_i + P_i G_{i-1} + P_i P_{i-1} G_{i-2} + \dots + P_i P_{i-1} ... P_0 C_{in} \quad (9)
$$



Fig. 3. Schematic of the proposed Generation and Propogation (G/P) circuit.



Fig. 4. Schematic of the 8-bit carry generation circuit.

The output of the 8-bit carry signals are then added with  $P_0 \sim P_7$  by the 8-bit sum generator, as shown in Fig. 5. The summation rersults are coupled to the output buffer for driving the large capacitive loads of 20 pF. Referring to Fig. 6 (a), the schematic of the ouput buffer is revealed. It is composed of 6 stages of inverters, which introduces the delay of  $\tau_4$  =  $0.69 \cdot N1 \cdot R_{stage} \cdot C_{stage}$ . Notably, N1 is 6 for the number of the stages.  $R_{stage}$  and  $C_{stage}$  refer to the equivalent parasitic resistance and capacitance at each stage.

The input TSPC DFF is revealed in Fig. 6 (b), which is a positive-edge triggered DFF. The delay time from the positive edge of the clk to the output is  $\tau_5 = 0.69 \cdot N2 \cdot R_{stage} \cdot C_{stage}$ . N2 is 4 for the number of the stages in the TSPC DFF.

Based on the above analysis, the critical path delay is expressed as  $\tau_{crtl} = \tau_{21} + \tau_{22} + \tau_4 + \tau_5$ .

#### III. IMPLEMENTATION AND SIMULATION

The proposed design is implemented with a typical 16 nm FinFET technology. Fig. 7 shows the layout of the design, where the core area is  $206.403 \times 152.506 \mu m^2$ , and the overall chip area is  $618 \times 618 \ \mu \text{m}^2$ . The pre-layout simulation waveforms of the output signals are revealed in Fig. 8, where 5 process corners of TT, SS, FS, SF, and FF for Vdd = 0.8 V, load of 20 pF and 20 GHz clock frequency are given. The worst case of the rise time, fall time, and the delay time



Fig. 5. Schematic of 8-bit sum generation circuit.



Fig. 6. Schematic of (a) the output buffer and (b) the TSPC DFF.

are 70 ps, 130 ps, and 176 ps, respectively. Because of the additional parasitic RC, the post-layout simulations show that the rise time, fall time, and the delay time are 562 ps, 464 ps, and 931 ps, respectively, at the same condition, as shown in Fig. 9. Table I summarizes the performance of the proposed design with several prior works. The power consumption is normalized by the equation in the notation such that the normalized PDP is calculated. The proposed design possesses the best normalized power and PDP.

# IV. CONCLUSION

This paper proposes the 20 GHz 8-bit CLA for a 20 pF load. By using the ANT logic and the FinFET device, the power consumption is reduced and the operation speed is enhanced. With the analysis of the delay time, the speed performance is estimated correctly. The simulated results show the proposed design possesses the best performance by far.

#### ACKNOWLEDGMENT

This research was partially supported by the Ministry of Science and Technology under grant no. MOST 109-2224-



Fig. 7. Layout of the proposed design.





Note:  $P_{nor} = \frac{P}{f \cdot C_{Load} \cdot V_{dd}^2}$ 

| start class pre.tro<br>and start<br>start clair pontrit<br>start class pro tra                                                                      | 0.8330<br>0.6<br>0.4<br>0.2<br>$\circ$  | ------<br><b><i><u>CONTRACTORS</u></i></b><br>$-2000$ and $-2000$<br><b>SOUNDS</b><br>$-11.1$ | .<br><br>------<br>. | <br>. | --------<br>-----<br>---------<br>------<br><br>----- | --------- | <br>--------<br>------<br>1.111 | <br><b>STATISTICS</b><br>and the company of the company<br>-------<br>-------<br><b>WARRANT</b> |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|-----------------------------------------------------------------------------------------------|----------------------|-------|-------------------------------------------------------|-----------|---------------------------------|-------------------------------------------------------------------------------------------------|
| reset clas ora tro<br>reset cab<br>reset<br>reart cost on the<br>reset clab orated                                                                  | $0.8$ Thrd<br>0.6<br>0.4<br>0.2<br>۰õ   |                                                                                               |                      |       |                                                       |           |                                 |                                                                                                 |
| a7 clall pre.tr0<br>57                                                                                                                              | 0.8 00<br>0.0<br>0.4<br>0.2<br>$\alpha$ |                                                                                               |                      |       |                                                       |           |                                 |                                                                                                 |
| tő ciak pre trú<br>$\frac{16}{16}$ S6<br>th chik on tra                                                                                             | 0.8.7<br>0.0<br>0.4<br>0.2<br>$\alpha$  |                                                                                               |                      |       |                                                       |           |                                 |                                                                                                 |
| \$5 class pre.trd<br>SS<br>\$5 clieb.<br>\$5 clas pre.tr4                                                                                           | 0.418<br>0.6<br>0.4<br>0.2<br>n.        |                                                                                               |                      |       |                                                       |           |                                 |                                                                                                 |
| \$4 clas pre.trd<br>$\cos k$ out of<br>$\overline{a}$<br>S <sub>4</sub><br><b>KA</b><br>$\overline{1}$<br>14 clab pre.tr-t                          | 0.4 00<br>0.6<br>0.4<br>0.2<br>۰ō       |                                                                                               |                      |       | ∩                                                     |           |                                 |                                                                                                 |
| s3 cla@ pre.tr0<br>S <sub>3</sub><br>\$5 <sub>0</sub><br>\$3 <sub>0</sub><br>s3 clas pre.trs<br>s3 clair pre trd                                    | 0.81<br>0.0<br>0.4<br>0.2<br>$\alpha$   |                                                                                               |                      |       |                                                       |           |                                 |                                                                                                 |
| 42 clair one trù<br>$12$ clini<br><b>S2</b><br>$$2$ $$\%$<br>\$2 clas ore tr3<br>Conformation                                                       | 0.819<br>0.0<br>0.4<br>0.2<br>`ñ        |                                                                                               |                      |       |                                                       |           |                                 |                                                                                                 |
| \$1 clas pre.trd<br>$rac{a}{2}$ c <sub>1</sub><br>э.<br>47,000<br>\$1 clas prezza                                                                   | 0.81<br>0.6<br>0.4<br>0.2<br>a          |                                                                                               |                      |       |                                                       |           |                                 |                                                                                                 |
| \$0 clas ore fro<br>$\frac{c}{c}$ $\frac{c}{c}$ $\frac{c}{c}$ $\frac{c}{c}$<br>50<br>$10 \text{ cm}$<br>$10 \sin \theta$ pre tr3<br>10 clap ore tr4 | 0.5<br>0.1<br>0.4<br>0.2<br>$\alpha$    |                                                                                               |                      |       |                                                       |           |                                 |                                                                                                 |
| timelsect (lin)                                                                                                                                     |                                         | $\epsilon$                                                                                    | 10n                  | l šn  |                                                       |           | 250                             | ۱š٥                                                                                             |

Fig. 8. Pre-layout simulation waveforms of the output signals in various corners.



Fig. 9. Post-layout simulated waveforms of the output signals.

E-110-001-, MOST 109-2218-E-110-007-, MOST 109-2221- E-110-079- and 110-2218-E-110-008-. Moreover, the authors would like to express their deepest appreciation to TSRI (Taiwan Semiconductor Research Institute) of NARL (National Applied Research Laboratories), Taiwan, for the EDA tool assistance.

#### **REFERENCES**

[1] G. K. Reddy, "Low power-area pass transistor logic based ALU design using low power full adder design," in Proc. *2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO)*, pp. 1-6, 2015.

- [2] P. Bhattacharyya, B. Kundu, S. Ghosh, V. Kumar and A. Dandapat, "Performance analysis of a low-power high-speed hybrid 1-bit full adder circuit," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 23, no. 10, pp. 2001-2008, Oct. 2015.
- [3] H. Naseri and S. Timarchi, "Low-power and fast full adder by exploring new XOR and XNOR gates," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 26, no. 8, pp. 1481-1493, Aug. 2018.
- [4] W. Al-Akel, K. Abugharbieh, A. Hasan and H. W. Marar, "A power efficient 500 MHz adder," in Proc. *2019 Southeast Conf.*, pp. 1-4, 2019.
- [5] Y. Sun and V. Kursun, "A comparison of high-frequency 32-bit dynamic adders with conventional silicon and novel carbon nanotube transistor technologies," in Proc. *2013 International SoC Design Conference (ISOCC)*, pp. 039-042, 2013.
- [6] S. Vidhyadharan and S. S. Dan, "An efficient ultra-low power and superior performance design of Ternary half adder using CNFET and gate-overlap TFET devices," *IEEE Transactions on Nanotechnology*, Early access article, 2021.
- [7] R. Saraswat, S. Akashe, and S. Babu, "Designing and simulation of full adder cell using FinFET technique," in Proc. *2013 7th International Conference on Intelligent Systems and Control (ISCO)*, pp. 261-264, 2013.
- [8] A. Raghunandan and D. R. Shilpa, "Design of high-speed hybrid full adders using FinFET 18nm technology," in Proc. *2019 4th International Conference on Recent Trends on Electronics, Information, Communication* & *Technology (RTEICT)*, pp. 410-415, 2019.
- [9] M. Afghahi, "A robust single phase clocking for low power, high-speed VLSI applications," *IEEE J. of Solid-State Circuits*, vol. 31, no. 2, pp. 247-253, Feb. 1996.
- [10] C.-H. Hsu, G.-N. Sung, T.-Y. Yao, C.-Y. Juan, Y.-R. Lin and C.-C. Wang, "Low-power 7.2 GHz complementary all-N-transistor logic using 90 nm CMOS technology," in Proc. *2009 IEEE International Symposium on Circuits and Systems*, pp. 389-392, 2009.
- [11] B. E. Veera, P. M. Swadhi, S. Sujitha, R. Susmitha and S. V. Sonia, "Realization of high speed low power MCC adder using dynamic CMOS transistors," in Proc. *2018 International Conference on Current Trends towards Converging Technologies (ICCTCT)*, pp. 1-5, 2018.
- [12] S. Akhter, S. Chaturvedi, S. Khan and A. Bhardwaj, "An sfficient CMOS dynamic logic-based full adder," in Proc. *2020 6th International Conference on Signal Processing and Communication (ICSC)*, pp. 226- 229, 2020.
- [13] R. J. Baker, H. W. Li, and D. E. Boyce*CMOS Circuit Design, Layout, and Simulation, 3nd Ed.* New York: Wiley-Interscience, 2013.
- [14] S. Goodnick, A. Korkin, and R. Nemanich, *Semiconductor Nanotechnology: Advances in Information and Energy Processing and Storage*, Springer, 2018.
- [15] B. Parvais, M. Dehan, V. Subramanian, A. Mercha, K. Tamer San, M. Jurczak, G. Groeseneken, W. Sansen, and S. Decoutere, "Analysis of the FinFET parasitics for improved RF performances," in Proc. *2007 IEEE International SOI Conference*, pp. 37-38, 2007.