- [3] H. Kawakami, "Bifurcation of periodic responses in forced dynamic nonlinear circuits: Computation of bifurcation values of the system parameters," *IEEE Trans. Circuits Syst.*, vol. CAS-31, pp. 248–260, Mar. 1984.
- [4] F. Chapeau-Blondeau and G. Chauvet, "Stable, oscillatory, and chaotic regimes in the dynamics of small neural networks with delay," *Neural Networks*, vol. 5, no. 5, pp. 735–743, 1992.
- [5] F. Pasemann, "Characterization of periodic attractors in neural ring networks," *Neural Networks*, vol. 8, no. 3, pp. 421–429, 1995.
- [6] K. Jin'no and T. Saito, "Analysis and synthesis of a continuous-time hysteresis neural network," in *Proc. 1992 IEEE Int. Symp. Circuits Systems (ISCAS'92)*, San Diego, CA, 1992, pp. 471–474.
- [7] \_\_\_\_\_, "Obtaining an ideal associative memory by means of a simple hysteresis network," *Electron. and Commun. Jpn.*, pt. 3, vol. 78, no. 9, pp. 76–85, 1995.
- [8] T. Saito and M. Oikawa, "Chaos and fractals from a forced artificial neural cell," *IEEE Trans. Neural Networks*, vol. 4, no. 1, pp. 43–52, 1993.
- [9] T. Saito and Y. Matsumoto, "Chaos, torus, and synchronization from three coupled relaxation oscillators," *IEEE Trans. Circuits Syst.*, vol. 41, pp. 754–759, Nov. 1994.
- [10] K. Jin'no, "Chaos and related bifurcation phenomena from a simple hysteresis network," *IEICE Trans. Fundamentals*, vol. E79-A, no. 3, pp. 402–414, 1996.
- [11] K. Jin'no and T. Saito, "Analysis of periodic attractors from a simple hysteresis network," *IEICE Trans. Fundamentals*, vol. E79-A, no. 6, pp. 873–882, 1996.
- [12] O. E. Rössler, "An equation for hyperchaos," Phys. Lett. A, vol. 71, pp. 155–157, 1979
- [13] J. D. Farmer, E. Ott, and J. A. Yorke, "The dimension of chaotic attractors," *Physica 7D*, 1983, pp. 153–180.
- [14] K. Jin'no and M. Tanaka, "Hysteresis quantizer," in *Proc.* 1997 IEEE Int. Symp. Circuits Systems (ISCAS'97), Hong Kong, 1997, pp. 661–664.

# A Low-Power and High-Speed Dynamic PLA Circuit Configuration for Single-Clock CMOS

Chua-Chin Wang, Chi-Feng Wu, Rain-Ted Hwang, and Chia-Hsiung Kao

Abstract—Certain logic functions such as the control units of VLSI processors are difficult to implement by random logic. Since the programmable logic arrays (PLA's) can implement almost any Boolean function, they have become popular devices in the realization of both combinational and sequential circuits. We present a low-power high-speed complementary—metal—oxide semiconductor (CMOS) circuit implementation of NOR-NOR PLA using a single-phased clock. Buffering static NAND gates are inserted between the NOR planes to erase the racing problem and shorten the duration of glitches such that the dynamic power is reduced in addition to the low static power dissipation, no ground switch, no charge sharing, and zero offset.

Index Terms—High-speed, low-power, NOR-NOR PLA, single clock.

#### I. Introduction

PLA's can be implemented by either static or dynamic styles. The style is chosen depending on the timing and power strategies. Modern CAD tools are required to support the integration of commonly used single-phased edge-triggered basic elements [1], including programmable logic arrays (PLA's). Before the discussion of the proposed PLA design, the shortcoming of several PLA design methods are listed as follows.

Pseudo-NMOS [5]: It is the simplest design style to realize PLA's. The main disadvantage of this approach is the dc-path dissipation. In addition, because of the ratioed design the PMOS and NMOS have to be enlarged dually when the pull-up time is critical. Meanwhile, the ratioed design will reduce the speed.

Dynamic NOR-NOR [5], [4]: The major problem of this type of logic is the racing problem when two dynamic logic gates are cascaded in series. There is a possibility that the output of the first gate wrongly turns the second gate ON or OFF such that the final result is incorrect. Thus, it is necessary to generate a delayed clock for the second gate in order to prevent the racing problem. This will reduce the operation speed. In addition, the ground switch will produce a large parasitic capacitance which certainly reduces the speed.

Domino [5]: In domino-logic design, the gates are all precharged, and connected to the next stage through inverters. Although the SOP domino circuits are excellent with regard to power saving, the serial NMOS's of the front AND plane will cause a large pull-down delay. In addition, the serial NMOS's could cause charge-sharing problems.

Dhong's Design [3]: Dhong et al. proposed a PLA design approach which employs a precharged OR array and a charge-sharing AND array to eliminate the ground switch of the second gate. Since the charge sharing is used, the output voltage  $V_{oH}$  can only reach approximately 3.0 V when  $V_{dd}$  is 5.0 V. It cannot provide the full swing of the voltage aside from the low-noise margin problem. As well, a delayed clock is needed in order to prevent the racing problem.

Manuscript received September 16, 1997; revised July 31, 1998. This work was supported in part by the National Science Council under Grants NSC 87-2215-E-110-010 and 86-2622-E-009-009. This paper was recommended by Associate Editor G. Martinelli.

The authors are with the Department of Electrical Engineering, National Sun Yat-Sen University, Taiwan.

Publisher Item Identifier S 1057-7122(99)05583-X.



Fig. 1. Low-power and high-speed PLA circuit.

TABLE I SWITCHING ACTIVITY COMPARISON OF OUR PLA AND NOR-NOR PLA. (NOTE: SW = SWITCHING, NS = NO SWITCHING)

| I/Ps  | probability         | node p  |        | node q  |        | node $r$ |        |
|-------|---------------------|---------|--------|---------|--------|----------|--------|
|       |                     | NOR-NOR | ourpla | NOR-NOR | ourpla | NOR-NOR  | ourpla |
| all 0 | 1/2n                | NS      | NS     | NS      | SW     | NS       | SW     |
| any 1 | $\frac{2^n-1}{2^n}$ | SW      | SW     | SW      | NS     | SW       | NS     |

Capacitors are required in this design in addition to the mentioned shortcomings. This implies large area consumption.

Blair's Design [2]: Blair replaced the usual AND plane with a predischarging pseudo-NMOS NOR plane in order to shorten the series NMOS transistors in the evaluation block. The PMOS load transistor is constrained by the sizing ratio such that it is hard to drive a large capacitance load and the speed is reduced. In addition, the static power is increased during the evaluation period of the clock. (Notably, ratioed designs will reduce speed.)

We consider the combination of dynamic, pseudo-N, and dominologic design styles to develop a low-power and high-speed design for PLA's using only one clock. The basic concept is to insert a buffering NAND gate between two NOR planes in order to eliminate the ground switch and reduce the duration of dynamic power spikes to avoid racing problems.

#### II. LOW-POWER AND HIGH-SPEED SINGLE-CLOCK PLA DESIGN

# A. Low-Power and High-Speed (LP-HS) PLA Circuit

Referring to Fig. 1, all of the inputs of the first NOR plane are ANDed with the clock signal by using the triggered one-bit decoder [3]  $\overline{clk}$  before they are fed into the gates of the evaluation block. Thus, in the precharging duration of the clock clk = 0 the node pwill be charged to high. A NAND gate is utilized as the buffer. The advantage of this NAND gate, as shown in Fig. 1, is to precharge node q to high while to predischarging node r to ground to prevent the racing problem.

When the clock turns high, clk = 1, the input is fed through the triggered one-bit decoder to the NMOS transistors in the evaluation block. In the meantime, the buffering NAND gate turns into an inverter. If the pull-down NMOS network resolves high, the node p is discharged, which keeps q and r, respectively, high and low. The state of output s remained unchanged. If the pull-down NMOS network resolves low, the node p remains high, which in turn flips

the states of q and r, respectively, to be low and high. The state of output s then is grounded.

## B. Analysis of Speed and Power

Speed: The speed of the dynamic-style PLA depends on the discharging speed of nodes p and s. The buffering NAND gate helps to charge node q to be high during the precharging duration. And ris low to turn off the pull-down NMOS network of the second gate before the evaluation phase. Thus, there is no racing problem so that the delayed clock can be eliminated to improve the speed.

Power: Because of the triggered one-bit decoder and the buffering NAND gate, there is no dc path from  $V_{dd}$  to ground. The most important factor regarding power dissipation is that the buffering NAND gate can statistically reduce the probability of switching activity in the PLA. Referring to Table I, the switching activities of our design and NOR-NOR PLA are, respectively, tabulated.

Note that the switching activity of other PLA's other than domino PLA is the same as that of NOR-NOR PLA. According to Table I, we conclude the comparison of between our PLA and NOR-NOR PLA according to the definition of power cost, which is power cost =  $\sum P_i \cdot C_i$  where  $P_i$  is the transition probability at node i and  $C_i$  is the capacitance of node i

Power Cost<sub>NOR-NOR</sub> = 
$$\frac{1}{2^n} \cdot 0 + \frac{2^n - 1}{2^n}$$
  
 $\cdot (C_p + C_q + C_r)$  (1)  
Power Cost<sub>ourpla</sub> =  $\frac{1}{2^n} \cdot (C_q + C_r) + \frac{2^n - 1}{2^n}$  (2)

Power Cost<sub>ourpla</sub> = 
$$\frac{1}{2^n} \cdot (C_q + C_r) + \frac{2^n - 1}{2^n}$$
 (2)

$$\lim_{n \to \infty} \frac{\text{Power Cost}_{\text{ourpla}}}{\text{Power Cost}_{\text{NOR-NOR}}} = \frac{C_p}{C_p + C_q + C_r} < 1$$
 (3)

The above result predicts the power cost of our PLA decreases as the number of inputs increases. In contrast, the power cost of other NOR-NOR style PLA's increases as the number of inputs increases.



Fig. 2. Schematic diagrams of PLA design alternatives. (a) Pseudo-n. (b) NOR-NOR (c) Domino. (d) Dhong's. (e) Blair's. (f) Our PLA.

### C. Area Overhead

The total area overhead is (3n + 2m - ground switches - delayed clock circuits) transistors where n is the number of inputs and m is the number of minterms. 3n results from the triggered one-bit decoder with three transistors, while 2m results from using a buffering NAND gate to replace the traditional buffering inverter.

## III. SIMULATION AND ANALYSIS

Speed (Delay) Simulations: In order to verify the proposed low-power high-speed PLA configuration, we conduct a series of different PLA simulations to compare with other PLA designs as shown in Fig. 2. Different PLA designs are implemented by TSMC 0.6  $\mu$ m SPTM technology with PMOS (w/l = 2.25/0.6) and NMOS

(w/l=0.9/0.6) except that the PMOS load used in pseudo-NMOS PLA and Blair's PLA is ratioed to be w/l=0.9/1.2. Fig. 3 shows the timing responses of these PLA configurations. To effect a comparison, the output load of the first planes of the PLA's is assumed to be 0.5 pF, that of the ground switch is assumed to be 1.0 pF, the load of the buffers is assumed to be 1.0 pF, and the output load of these PLA's is set to be 1.0 pF. The waveforms in Fig. 3 are simulated by CADENCE and HSPICE tools with  $V_{dd}=5.0~\rm V$ . The average delay of these PLA's are tabulated in Table II. The delay is measured from 2.5 V of the input voltage to 50% of output voltage.

Our proposed PLA is the fastest circuit among all of the PLA design approaches. Notably, Dhong's design is a normally low operation which is different from the other designs. During the



Fig. 3. Waveforms of PLA design alternatives.

TABLE II
THE AVERAGE DELAY OF DIFFERENT PLA DESIGNS. (\*: IN THE SECOND NOR PLANE, THE CLOCK IS DELAYED BY 22 ns to Prevent the Racing Problem)

| Name        | Rise Delay (ns) | Fall Delay (ns) | $V_{oH}$ (volts) at $V_{dd} = 5$ V |
|-------------|-----------------|-----------------|------------------------------------|
| Pseudo-NMOS | 30.0            | 42.40           | 5.0                                |
| NOR-NOR     | 24.9            | 36.84           | 5.0*                               |
| Domino      | 5.2             | 39.64           | 5.0                                |
| Dhong       | 22.9            | 16.58           | 3.3*                               |
| Blair       | 4.5             | 25.55           | 5.0                                |
| outpla      | 2.6             | 15.56           | 5.0                                |

TABLE III
THE POWER DISSIPATION OF DIFFERENT PLA DESIGNS

| Name        | n = 1, average (mW) | n=2, average (mW) | n=3, average (mW) |
|-------------|---------------------|-------------------|-------------------|
| Pseudo-NMOS | 0.60390             | 0.59780           | 0.59200           |
| NOR-NOR     | 0.07768             | 0.09425           | 0.10460           |
| Domino      | 0.06691             | 0.03228           | 0.02354           |
| Dhong       | 0.07452             | 0.08265           | 0.09097           |
| Blair       | 0.20150             | 0.24120           | 0.27160           |
| ourpla      | 0.06335             | 0.04833           | 0.03788           |

TABLE IV
THE POWER-DELAY PRODUCT COMPARISON OF THE DOMINO PLA AND OUR PLA

| Name   | n = 1, (mW·ns) | $n=2, (mW \cdot ns)$ | n = 3, (mW·ns) |
|--------|----------------|----------------------|----------------|
| Domino | 2.65231        | 1.27958              | 0.93313        |
| ourpla | 0.98573        | 0.75201              | 0.58941        |

precharge period the output of Dhong's is low. Thus, the critical delay of Dhong's design is the rising edge delay, which is 22.9 ns, instead of the falling edge delay.

*Power Dissipation Simulations:* As for the power consumption comparison, we also conduct a series of simulations which employ the Monte Carlo method of HSPICE. The number of sweeps is 30, and the signal frequency is 1.67 MHz (clock period = 600 ns). The power dissipation results are tabulated in Table III.

The proposed PLA produces the least power consumption among these PLA design approaches other than the domino PLA. These results correspond to what we expect regarding dynamic power consumption when n increases. As for the comparison between our PLA and the domino PLA, although the domino PLA consumes less power when n increases, its pull-down delay will become longer and longer owing to the fact that the number of serial NMOS's in the evaluation block increases. If we consider the power-delay

product as a measure, Table IV reveals the superiority of our PLA design.

#### IV. CONCLUSION

In short, pseudo-NMOS PLA and Blair's PLA are ratioed design and dissipate DC power; NOR-NOR PLA and Dhong's PLA need delayed clock; domino's PLA has serial NMOS's AND gate. They all have their individual problems. The proposed PLA configuration, using one NAND gate between the product line and output line instead of one inverter, can eliminate the ground switch. It also keeps the inputs of the second plane at low before the evaluation phase to prevent the racing problem and the usage of delayed clocks. Thus, the speed is enhanced. The buffering NAND gate also reduces the switching probability such that the dynamic power consumption consequently becomes much smaller. This approach makes PLA low-power and high-speed possible. Its performance is also verified by the simulations.

#### REFERENCES

- M. Afghahi, "A robust single phase clocking for low power, high-speed VLSI applications," *IEEE J. Solid-State Circuits*, vol. 31, pp. 247–253, Feb. 1996.
- [2] G. M. Blair, "PLA design for single-clock CMOS," *IEEE J. Solid-State Circuits*, vol. 27, pp. 1211–1213, Aug. 1992.
- [3] Y. B. Dhong and C. P. Tsang, "High speed CMOS POS PLA using predischarged OR array and charge sharing AND array," *IEEE Trans. Circuits Syst. II*, vol. 39, pp. 557–564, Aug. 1992.
- [4] N. F. Goncalves and H. J. De Man, "NORA: A race-free dynamic CMOS technology for pipelined logic structures," *IEEE J. Solid-State Circuits*, vol. 18, pp. 261–266, June 1983.
- [5] N. H. E. Weste and K. Eshraghian, "Principles of CMOS VLSI Design—A Systems Perspective," 2nd ed. Reading, MA: Addison-Wesley, 1993.

# Low-Voltage BiCMOS Four-Quadrant Multiplier Using Triode-Region Transistors

Shen-Iuan Liu, Jiin-Long Lee, and Cheng-Chieh Chang

Abstract— A low-voltage BiCMOS four-quadrant multiplier using triode-region transistors is presented. This circuit has been fabricated in a 1.0- $\mu$ m BiCMOS process. Experimental results show that for a power supply of  $\pm 1.5$  V, the linear range is over  $\pm 0.6$  V with the linearity error of less than 2%. The total harmonic distortion is less than 2% with an input range up to  $\pm 0.6$  V. The measured -3-dB bandwidth of this proposed BiCMOS multiplier is about 10 MHz. This circuit is expected to be useful in low-voltage analog signal-processing applications.

Index Terms—BiCMOS, multiplier.

### I. INTRODUCTION

BiCMOS technologies are emerging as the next generation techniques for digital VLSI circuits [1], [2]. They also can be a viable

Manuscript received April 6, 1996; revised November 18, 1998. This work was supported in part by the National Science Council under Grant NSC-85-2215-E-002-021. This paper was recommended by Associate Editor D. Zhou.

The authors are with the Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617, R.O.C.

Publisher Item Identifier S 1057-7122(99)05581-6.



Fig. 1. The proposed BiCMOS four-quadrant multiplier.

TABLE I
THE ASPECT RATIOS OF THE DEVICES IN FIG. 1

| Devices     | M1-M2 | M3-M6 | M7-M10 |
|-------------|-------|-------|--------|
| W(μm)/L(μm) | 5/25  | 5/5   | 25/5   |

approach for analog circuits to improve system performance by combining both bipolar and CMOS technologies [3], [4]. Moreover, the trend toward higher device densities per unit chip area requires short channel length devices and, consequently, lower supply voltages in the VLSI chip. Thus, it is desirable to develop an analog integrated circuit suitable for low supply voltages. Multipliers [5], [6] are very important building blocks in many applications, such as adaptive filters, frequency doublers, and modulators. Some BiCMOS multipliers [7]-[9] have been presented, but few of them are suitable for low supply voltages. The triode-based multiplier can provide higher linearity and a smaller supply voltage [10]. In this paper, a new low-voltage BiCMOS four-quadrant multiplier using trioderegion transistors is presented. It provides the advantage over the circuits [7], [9] which require some additional control circuitry to achieve the same goal. Experimental results are given to verify the theoretical analysis.

## II. CIRCUIT DESCRIPTION

The proposed BiCMOS four-quadrant multiplier is shown in Fig. 1. The drain current for an NMOS transistor biased in the triode region can be expressed as

$$I_D = K[(V_{GS} - V_T)V_{DS} - V_{DS}^2/2]$$
 (1)

where K and  $V_T$  are the transconductance parameter and threshold voltage of the NMOS transistor, respectively. Assume that  $M_1$  and  $M_2$  in Fig. 1 are biased in the triode region and the remaining transistors in their normal working regions (i.e., BJT's in active and MOSFET's in saturation). Assume that  $V_{BE\,i}=V_{BE}$  (for  $i=1{\rm to}4$ ), the drain currents  $I_{M1}$  and  $I_{M2}$  of  $M_1$  and  $M_2$  can be expressed as

$$I_{M1} = K\{[V_M - (V_1 - V_{BE}) - V_T]$$

$$\cdot (V_2 - V_1) - (V_2 - V_1)^2 / 2\}$$

$$I_{M2} = K\{[V_N - (V_1 - V_{BE}) - V_T]$$
(2a)

$$(V_2 - V_1) - (V_2 - V_1)^2 / 2 \}.$$
 (2b)

Thus, the difference of the above equations can be

$$I_{M1} - I_{M2} = K(V_2 - V_1)(V_M - V_N).$$
(3)