# FAST HALF-SWING INTER-PLANE CIRCUITS FOR CLOCKED NOR-NOR PLAS §

Chua-Chin Wangt, Chih-Chiang Chiu, and Yu-Tsung Chien

Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 email: ccwang@ee.nsysu.edu.tw

## ABSTRACT

We present two fast half-swing CMOS circuits for NOR-NOR PLA implementation. An additional 1/2 VDD voltage source and buffering transmission gates are inserted between the NOR planes to erase the racing problem and shorten the rise delay as well as the fall delay of the output response such that the speed is enhanced. Besides, the proposed circuit also reveals other advantages of no ground switch, no charge sharing and zero offset.

## 1. INTRODUCTION

PLAs can be implemented by either static or dynamic styles. The style is chosen depending on the timing and power strategies. Prior works are focused on the improvement of speed and power by using many alternatives, e.g., erasing ground switch, NAND gate buffering, or reducing static current, [3], [2], [6]. An important fact which has been long ignored is that one of the largest state transition in a PLA is the switching of the load between the first plane and the second plane. Dynamic NOR-NOR PLA, [5], [3], has the racing problem when two dynamic logic gates are cascaded in series. Thus, delayed clocks for the second gate are required in order to prevent the racing problem. In the domino PLA [5], the gates are all precharged (or discharged), and connected to the next stage through inverters. The serial NMOSs of the front AND plane will cause large pull-down delay. Dhong et al. [2] proposed a PLA design approach which employs a precharged OR array and a charge sharing AND array to eliminate the ground switch of the second gate. It cannot provide the full swing of the voltage at the output,  $V_{oH}$ , aside the low noise margin problem. Besides, a delayed clock is needed in order to prevent the racing problem.

## 2. HALF SWING FAST PLA DESIGN

## 2.1. general prior PLA circuits

Referring to Fig. 1, which is a general architecture of prior PLAs, the slow response at the output of the first plane is the major reason why the entire PLA either operates slowly or functions incorrectly. Prior works regarding the speed enhancement and power-saving, [1], [2], [3], are all focused on using different gates between the two planes.

## 2.2. half-swing inter-plane circuit

A simple thought to reduce the state transition time at the inter-plane wire load is to precharge the output of the first plane to be  $\frac{1}{2}$  VDD in the precharge (or pre-discharge) duration so as to approximately reduce the subsequent rise delay or fall delay in the evaluation duration by 50%. As shown in Fig. 2, an extra  $\frac{1}{2}$  VDD power source is introduced accompanied with two cascaded inverters (i.e., the delay buffer) and one transmission gate. The entire operation of the proposed circuit is described as follows.

- **A.** When clk=0, node  $s,\ t,\ p$  and q, are respectively charged to VDD, GND, VDD, and  $\frac{1}{2}$  VDD. Note that the voltage at node q is kept at  $\frac{1}{2}$  VDD in the precharge duration is owing to the OFF state of the transmission gate, N1 and P1, . The transmission gate is controlled by  $\overline{clk}$ . Hence, p and q are isolated.
- **B.** When clk turns high, the N-block-1 proceeds its own evaluation process while the N1-P1 transmission gate is turned ON. Regardless what the outcome is at the output of the first plane, the voltage at node q is either pulled up from  $\frac{1}{2}$  VDD to full VDD or pulled down from  $\frac{1}{2}$  VDD to GND. Obviously, the delay of the output response of the first plane will be drastically shortened.
- C. Although the speed enhancement by the proposed half-swing circuit is achieved, another power dissipation problem must be resolved at the same time. Note that if the proposed circuit is applied to a design style of which the second plane does not have a clock-controlled transistor, e.g., pseudo-NMOS logic,

the contact author

<sup>§</sup>This research was partially supported by National Science Council under grant NSC 89-2215-E-110-017 and 89-2215-E-110-014.

the precharged voltage at node q might result in a DC path in the second plane composed of the P3 and the N-block-2. Hence, we need to add a clock-controlled NMOS between N-block-2 and GND such that the DC current path will not be created in the precharge duration. The power-saving function, thus, can be achieved.

**D.** Another important factor to consider is the balance of the rise delay and fall delay of the first plane output. A simple observation of the proposed inter-plane half-swing circuit is that if INV1 and INV 2 are usual inverters, the circuit still functions properly. However, if the evaluation result of N-block-1 is "stop," the voltage of s, t, and p stay the same, and q is pulled up to a full VDD. The response time is much faster than that of a pull down operation for q. The reason is owing to that if the evaluation result of N-block-1 is "pass," then s must be pulled down, t is pulled up, and p is pulled down. Then q will be pulled down. This simple fact reveals that the pull down of the output of the first plane is a longer process. In order to fix this problem, the sizes of INV1 and INV2 should be adjusted. A proposed size ratio is that INV1 possesses a large pull-up PMOS and a small pull-down NMOS, while INV2, on the contrary, has a small pullup PMOS and a large pull-down NMOS, as shown in Fig. 3.

## 2.3. modified circuitry for NOR-NOR PLAs

A modified half-swing inter-plane circuit for solely NOR-NOR PLAs is shown in Fig. 4. The interplane block is arranged between the inverters. In the precharge phase, clk=0, the precharged node p will almost fully turned on the wide NMOS of INV2 such that q will nearly turned off the N-block-2. Thus, the switching speed becomes faster than that of the previous inter-plane half-swing circuit. It is very suitable for high speed and high clock rate PLAs. However, the drawback is that the power consumption is raised owing to the slow turn-off of P3 which is controlled by the wide NMOS of INV2.

## 2.4. analysis of speed and overhead

**Speed**: The speed of the dynamic style PLA depends on the discharging speed of nodes q and r. The inter-plane half-swing circuit helps to pump node q to  $\frac{1}{2}$  VDD be during the precharging duration. It, in turn, reduces both the charging time (rise delay) and discharging time (fall delay). Meanwhile, there is no racing problem such that the delayed clock can be eliminated to improve the speed.

Overhead: The total area overhead is (6m-ground switches - delayed clock circuits) transistors where m is the number of minterms. 6m is resulted from using an inverter, a transmission gate, a PMOS switch to the  $\frac{1}{2}$  VDD, and one NMOS in the second plane to block the DC current. Another overhead is the  $\frac{1}{2}$ 

VDD source, which is not very hard to provide by the state of art, [4].

## 3. SIMULATION AND ANALYSIS

Speed (Delay) Simulations: In order to verify the proposed low-power high-speed PLA configuration, we conduct a series of different PLAs' simulations to compare with other PLA designs. Different PLA designs are implemented by TSMC 0.35  $\mu m$ 2P4M technology. Fig. 5 show a different prior NOR-NOR PLA design integrated with our proposed interplane half-swing circuitry. The sizes of each MOS are tabulated in Table 1. Fig. 6 shows the timing responses. The waveforms are simulated by CADENCE EDA tools with  $V_{dd} = 3.3$ V. The delays of the first plane output of these PLAs are tabulated in Table 2. The delay is measured from 90% of the input voltage change to 90% of output voltage change due to the proposed half-swing mode. Table 2 shows the speed performance of different PLAs.

The proposed circuit indeed speeds up the response time for NOR-NOR PLA design approaches. Then, we need to compare the delay of the response at the output of the second plane. Note that the second plane should provide a full swing output. Hence, the delay is measured from the 50% of the input voltage to the 50% of the output voltage. Besides, the dynamic NOR-NOR requires a delayed clock. After several simulations, the minimal delay of such a clock is 4.2 ns. We, thus, add such a delayed clock in the following simulation and the speed performance of the second plane is given in Table 3.

If there is no delayed clock for dynamic NOR-NOR and Dhong's PLA, their respective simulation results are given in Table 4. The original dynamic NOR-NOR and Dhong's PLA will provide incorrect outputs, but our circuit will not. Notably, Dhong's design is a normally low operation which is different from the other designs. During the precharge period, the output of Dhong's is low. Thus, the critical delay of Dhong's design is the rising edge delay instead of the falling edge delay.

Power Dissipation Simulations: As for the power consumption comparison, we also conduct a series of simulations which employ Monte Carlo method. The number of sweeps is 1000, and the signal frequency is 2.5 MHz (clock period = 400 ns). The power dissipation results are tabulated in Table 5. The proposed inter-plane half-swing circuit produces less power consumption regardless what type of PLAs. These results correspond to what we expect regarding dynamic power consumption. The proposed circuit also reveals its advantages on the power-delay product measure.

Modified Half-Swing Circuitry Simulations: The speed performance of the modified half-swing interplane circuitry shown in Fig. 4 is revealed in Table 6. Fig. 7 is the proposed circuitry for NOR-NOR PLAs,

while Fig. 8 shows the simulation result. It is obvious that the delay is shortened while the power consumption is increased. Notably, the original NOR-NOR PLA generates  $V_{oH} = 0.444$  V if there is no delayed clock. The simulation results meet what we expected.

Implementation of a 8-Bit CLA: Fig. 9 shows a 8-bit CLA implemented by the proposed PLA design using the modified inter-plane half-swing circuitry. The measurement of the proven design on silicon turns out to be close to the simulation results.

## 4. CONCLUSION

The proposed inter-plane half-swing circuit configuration, using one transmission gate and an extra  $\frac{1}{2}$  VDD source between the product line and output line instead of a buffer or an inverter, can eliminate the ground switch, increase the response speed, and reduce power consumption. It also keeps the inputs of the second plane at a "stop status" before the evaluation phase to prevent the racing problem and the usage of delayed clocks.

#### 5. REFERENCES

- M. Afghahi, "A robust single phase clocking for low power, high-speed VLSI applications," *IEEE J. of Solid-State Circuits*, vol. 31, no. 2, pp. 247-253, Feb. 1996.
- [2] Y. B. Dhong, and C. P. Tsang, "High speed CMOS POS PLA using predischa rged OR array and charge sharing AND array," *IEEE Trans. on Circuits & Systems -II: Analog and Digital Sig*nal Processing, vol. 39, no. 8, pp. 557-564, Aug. 1992.
- [3] N. F. Goncalves, and H. J. De Man, "NORA: A race-free dynamic CMOS te chnology for pipelined logic structures," *IEEE J. on Solid-State Circuits*, vol. 18, pp. 261-266, June 1983.
- [4] K. W. Mai, T. Mori, B. S. Amrutur, R. Ho, B. Wilburn, M. A. Horowitz, I. Fukushi, T. Izawa, and S. Mitarai, "Low-power SRAM design using half-swing pulse-mode techniques," *IEEE J. of Solid-State Circuits*, vol. 33, no. 11, pp. 1659-1671, Nov. 1998.
- [5] N. H. E. Weste, and K. Eshraghian, "Principles of CMOS VLSI Design - A Systems Perspective," 2nd edition. Reading, MA: Addison-Wesley, 1993.
- [6] C.-C. Wang, Y.-T. Chien, and Y.-P. Chen, "Power-saving fast half-swing inter-plane circuit for clocked PLAs," 1999 Symposium of Microprocessors Design, pp. 141-148, May 1999.

| PLA      | plane 1 | plane 1  | plane 2 | plane 2  |
|----------|---------|----------|---------|----------|
| name     | PMOS    | NMOS     | PMOS    | NMOS     |
| Pseudo-N | 0.6:1   | 0.35:2.5 | 0.6:1   | 0.35:2.5 |
| NOR-NOR  | 0.35:3  | 0.35:3   | 0.35:3  | 0.35:1.5 |
| Domino   | 0.35:3  | 0.35:3   | 0.35:5  | 0.35:1   |
| Dhong's  | 0.35:2  | 0.35:3   | 0.35:2  | 0.35:1   |

| N1     | P1     | N2     | P2     |
|--------|--------|--------|--------|
| 0.35:9 | 0.35:9 | 0.35:1 | 0.35:4 |

Table 1: transistor sizes of the PLAs. (L:W)

| Name  | Pseudo-N | NOR-NOR | Domino | Dhong's |
|-------|----------|---------|--------|---------|
| Prior | 68.84    | 6.89    | 21.74  | 5.31    |
| Ours  | 50.96    | 5.46    | 17.20  | 4.27    |

Table 2: The worst delay of the first plane output of different PLA designs. (N2 does not exist in this series of simluations; unit = ns)

|                 | Delay (ns) | $V_{oH}$ |
|-----------------|------------|----------|
| Pseudo-N        | 26.3892    | 3.3      |
| Pseudo-N + ours | 23.7739    | 3.3      |
| NOR-NOR         | 12.4808    | 3.3      |
| NOR-NOR + ours  | 10.8066    | 3.124    |
| Domino          | 18.6663    | 3.3      |
| Domino + ours   | 11.3128    | 2.926    |
| Dhong's         | 7.91569    | 1.65     |
| Dhong's + ours  | 4.16526    | 1.617    |

Table 3: The worst delay of the second plane output of different PLA designs.

|                  | Delay (ns) | $V_{oH}$ |
|------------------|------------|----------|
| NOR-NOR          | 8.38657    | 2.858    |
| NOR-NOR + ours   | 10.8066    | 3.124    |
| Dhong's          | 5.16275    | 1.52     |
| Dhong's $+$ ours | 4.16526    | 1.617    |

Table 4: The worst delay of the second plane output of different PLA designs without a delayed clock.

|                    | P (mW)  | D (ns)  | $P \times D$ |
|--------------------|---------|---------|--------------|
| Pseudo-N           | 0.5197  | 26.3892 | 13.71        |
| Pseudo-N+ours (*)  | 0.6680  | 23.7739 | 15.88        |
| Pseudo-N+ours (**) | 0.4101  | 13.9204 | 5.71         |
| Domino             | 0.08145 | 18.6663 | 1.52         |
| Domino+ours (*)    | 0.35690 | 11.3128 | 4.04         |
| Domino+ours (**)   | 0.05776 | 16.4178 | 0.95         |

Table 5: The power dissipation of different PLA designs. (\*: without N2, \*\*: with N2)

|                  | Delay      | $V_{oH}$ | Power              |
|------------------|------------|----------|--------------------|
| NOR-NOR          | 248.698 ps | 3.4      | 2.88 mW            |
| NOR-NOR+modified | 234.346 ps | 3.0      | $5.47~\mathrm{mW}$ |

Table 6 : Performance comparison of the modified inter-plane circuitry. (The width of the MOSs is enalrged roughly 10.0 times. Clock rate is 500 MHz.

A delayed clock is added.)



Figure 1: Prior NOR-NOR PLA



Figure 2: Inter-plane half-swing circuit



Figure 3: Ratioed design of the inverters



Figure 4: Modified half-swing circuitry



Figure 5: Dynamic NOR-NOR + ours circuit



Figure 6: Output waveforms of of Dynamic NOR-NOR and Dynamic NOR-NOR + ours



Figure 7: Dynamic NOR-NOR + modified circuit



Figure 8: Output waveforms of of modified circuitry for NOR-NOR PLAs (clock =  $500~\mathrm{MHz}$ )



Figure 9: Die photo of a 8-bit CLA using the proposed PLA circuitry  $\,$