# A 4-KB 500-MHZ 4-T CMOS SRAM USING LOW- $V_{THN}$ BITLINE DRIVERS AND HIGH- $V_{THP}$ LATCHES § Chua-Chin Wang †, Hon-Yuan Leo, and Ron Hu ‡ Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 email: ccwang@ee.nsysu.edu.tw #### ABSTRACT The design of a prototypical 500-MHz CMOS 4-T SRAM is presented. The storage of data is realized by a pair of cross-coupled PMOS transistors, while the wordline controls a pair of NMOS transistors. The wordline voltage compensation circuit and bitline boosting circuit, then, are neither needed to enhance the data retention of memory cells. Built-in self-refreshing paths makes the data retention possible without the appearance of any external refreshing mechanism. Most important of all, low threshold voltage transistors are used in driving bit lines while high threshold voltage transistors are used in latching data voltages. The advantages of dual threshold voltage transistors can be used to reduce the access time and maintain data retention at the same time. Besides, a cascaded noiseimmune ATD (address transition detector) is also included to filter out the unwanted CS (chip select) glitches when the SRAM is asynchronously operated. #### 1. INTRODUCTION Although conventional 6-T SRAMs [5] are easily embedded in logic LSIs owing to its compatibility with the CMOS logic process, they are not economical to be included in practical systems due to the large cell area [6]. Takeda et al. proposed a smaller 4-T SRAM cell and macro [7]. As pointed by [3], the loadless design of the 4-T SRAMs, [6], [7], requires special processes to reduce the threshold voltage of PMOS in order to increase the supply current. Otherwise, the self-refreshing operation is failed due to the leakage current. In short, the wordline-controlled PMOS transistors provide poor driving capability. The design proposed in [3] demands a second power supply voltage, $VDD + \Delta V$ , to precharge the bitlines in order to resolve the refreshing of the weak "1" of the storage node. Besides, the access time of the cell will not as fast as conventional 6-T SRAM cell. A wordlinevoltage-level compensation (WLC) is required for data retention. MoSys Inc. announced an astonishing 1-T SRAM [2] which was claimed to be portable to SOC applications. The 1-T SRAM cell is basically a planar DRAM cell [4], with a special linearization biasing scheme and a sophisticated refreshing controller. Although the area is dramatically reduced, the tradeoff includes multiple voltage sources, alignment precision, and high soft-error rate. On top of these problems, the accessing speed is also slowed down, which is estimated around 300 MHz at 1.8 V and 0.18 $\mu m$ CMOS process. In this paper, we propose a novel Platch N-drive 4-T SRAM cell which eliminate the WLC and improve the accessing speed of the SRAM while a built-in self-refreshing path makes the data retention feasible without any externally added refreshing control circuit. Besides, low- $V_{th}$ transistors are used as bit line drivers and high- $V_{th}$ transistors are the data latch components. Not only can the access time be shortened, the data retention is also enhanced. ## 2. HIGH-SPEED 4-T SRAM DESIGN Although a 4-T SRAM cell was proposed to realize an embedded macro for SOC applications [7], the intrinsic poor driving capability of wordline-controlled PMOSs deteriorates the access time [1]. Hence, extra sophisticated circuitry is required to neutralize such a problem. #### 2.1. Proposed SRAM cell Referring to Fig. 1, a P-latch N-drive 4-T SRAM cell is proposed to resolve all of the difficulties mentioned in the prior works. The data are kept in the back-to-back PMOS pair which is also called the "core". N1 and N2 are respectively the bit lines (BL, $\overline{\rm BL}$ ) drivers which are controlled by word line (WL). If the threshold voltage of N1 and N2 is low, the switching time of N1 and N2 will be reduced which will in turn shorten the access time of the SRAM cell. Hence, we use Native V<sub>th</sub> provided by TSMC 0.18 $\mu$ m 1P6M process to implement low threshold voltage driving transistors. <sup>§</sup>This research was partially supported by National Science Council under grant NSC 89-2215-E-110-014. <sup>†</sup>Prof. Wang is the contact author. <sup>‡</sup>Dr. Hu is General Manager of Asuka Semiconductor Inc., Hsin-Chu, Taiwan. It will produce more driving current than normal or high- $V_{th}$ driving transistors. By contrast, transistors with high $V_{th}$ possess low leakage current and subthreshold current. Thus, they are very good to be cross-coupled as a data latch as shown in Fig. 1. We, then, use Nominal $V_{th}$ transistors, i.e., P1 and P2, to keep valid data. It is also well known that such a pseudo-latch still possesses the leakage problem which leads to the loss of the stored data. The leakage can be neutralized by the following methods. Hidden self-recharging path: Referring to Fig. 2, assume that N1 is off given that WL=0. There are a total of 4 currents affecting the voltage level of node Q when the N2 is turned off and the data node Q is floating. subthreshold current : $I_{P1}$ and $I_{N2}$ reverse bias current : $I_{D1}$ and $I_{D2}$ The requirement of the data retention for the possible weak "0" at node Q is $(I_{P1}+I_{D1})<(I_{N2}+I_{D2})$ . Notably, the magnitude of the subthreshold currents are adjustable according to the following equations. $$I_{sub} = \frac{W}{L}e^{\frac{V_{gs}-V_t}{nV_T}}(1-e^{\frac{-V_{ds}}{V_T}})$$ $$I_D = WL' \cdot I_S(e^{\frac{V}{V_T}}-1) = I_{leakage}$$ where L and L' denote the width of the gate and the parasitic diode. Thus, the data retention problem can be resolved by solving the W/L ratios to meet the the requirement of $(I_{P1} + I_{D1}) < (I_{N2} + I_{D2})$ . Fig. 3 reveals the comparison of the NEC SRAM cell and ours. Table 1 tabulated different delay figures of the NEC SRAM cell and ours. | delays | NEC cell [7] | ours | |------------------------|--------------|--------| | rise propagation delay | 270 ps | -56 ps | | fall propagation delay | 269 ps | -77 ps | | rise delay | 131 ps | 215 ps | | fall delay | 289 ps | 146 ps | Table 1: Performance comparison of SRAM cells (MOS model = TT, load = 0.8 pF, Temp = $25^{\circ}$ C) Pre-discharging the bitlines: One serious problem in the prior SRAM designs is the weak "1" supplied by the wordline-controlled NMOS transistors. Both of the bitlines are dropped to $VDD-V_{thn}$ as soon as the NMOS pass transistors are turned on. The SA (sense amplifier) following the bitlines has to wait for one of the bitlines dropping below such a voltage level and then start to function. The access time of the readout operation, therefore, is increased. Hence, we propose a pre-discharging scheme to replace the precharging scheme in the prior designs. This method also eliminates the second power supply which provides a $VDD+V_{thn}$ or $VDD+\Delta V$ in [3] to refresh the cells. As shown in Fig. 4, PL is the discharging signal to ground BL and BLB in non-accessing durations. Table 2 shows that the speed-up ratio of the pre-discharging scheme is twice as fast as the NEC SRAM cells. | delays | NEC cell [7] | ours | |-------------------|--------------|----------| | propagation delay | 106.9 ps | 51.61 ps | | fall/rise delay | 198.0 ps | 86.65 ps | Table 2: Comparison of precharging and predischarging (MOS model = TT, load = 0.8 pF, Temp = 25°C) #### 2.2. Cascaded noise-resistant ATD An address transition detector (ATD) is required to initiate a memory read/write operation asynchronously. As soon as the address lines on the bus are detected to reveal a significant variation, a chip select (CS) signal is asserted to enable the memory. The advantages of such a method are obvious, i.e., power-saving, pinsaving, and high-speed. Drawbacks of prior ATD designs were addressed in [8]. By contrast, the ATD in [8] has its own problem which is the large area due to the delay elements. We, thus, propose a cascaded ATD which is composed of Stage 1 and Stage 2 described in the following text to overcome all of the mentioned problems. Stage 1: As shown in Fig. 5, a typical address transition detector for each address line is used. The $AT_i$ signal (address transition), where $i \in 0, \ldots n-1$ , will be very vulnerable to glitches and noises at the address line since the XOR gate is transparent. Stage 2: Fig. 6 shows the ATD to filter any unwanted glitches or noises coupled in the ATG which is the granted ${\rm AT}_i$ signal. The design comprises a SRFF and a delay buffer, BUFF2. The feedback loop stabilizes the meaningful CS signal. BUFF2 is used to predetermine the width of the CS strobe such that any perturbation occurring in the duration of the strobe is ignored. - (1). After initialization, A and E are set to high, C, CS, and B are low. - (2). As soon as one ADDR $_i$ is switched, AT $_i$ will be high. If the AT $_i$ is granted to actually affect the final CS strobe, becomes the ATG in Fig. 6. C, thus, is turned high, which in turn set Q of the SRFF, i.e., CS and D. Meanwhile, E is low and turns C low through the feedback loop. - (3). The W/L of the inverters in BUFF2 is adjustable which determines the transient voltage at B. The rise time at B is the duration that CS remains high, while the fall time at B equals to the time that the CS can be turned high again after it is switched low. An ideal operation waveform is given in Fig. 7. Fig. 8 is the simulation waveform of the Stage 2 when $AT_i$ is coupled with unwanted glitches. These glitches or false-alarm switches are all rejected. The entire ATD is shown in Fig. 9. Since all of the address lines won't be switch simultaneously, the Stage 2 is required to keep the SRAM operating correctly in the asynchronous mode. Fig. 10 is the simulation waveforms given a false-alarm address transition. # 3. CHIP IMPLEMENTATION & POST-LAYOUT SIMULATIONS The proposed design is implemented by TSMC 0.18 $\mu m$ 1P6M CMOS technology which is a digital process. Besides the usual synchronous mode and asynchronous mode, which the ATD is enabled, the proposed memory also comprises a BIST mode. Fig. 11 is the chip layout in the CADENCE EDA tool. The area is 930×823 $\mu m^2$ . The specifications of the proposed 4 Kb SRAM is summarized in Table 3. | | Synch. | Asynch. | BIST | |--------------|-------------------------|-------------------------|-------------------------| | clock | $500~\mathrm{MHz}$ | 250 MHz | 100 MHz | | access delay | $2.49 \mathrm{\ ns}$ | 4.1 ns | $10.0 \mathrm{\ ns}$ | | avg. power | $152~\mathrm{mW}$ | 71.1 mW | 15.8 mW | | max. power | 818 mW | 825 mW | 346 mW | | VDD | $1.8 \pm 0.2 \text{ V}$ | $1.8 \pm 0.2 \text{ V}$ | $1.8 \pm 0.2 \text{ V}$ | Table 3: 4 Kb 4-T SRAM specs in different modes Fig. 12 shows the post-layout simulation results of the proposed 4-T SRAM. They are generated by the TimeMill and CADENCE tools. The proposed design is approved by CIC (Chip Implementation Center) to be fabricated by TSMC under the chip no. R18-91A-02b. ### 4. CONCLUSION We have presented a novel P-latch N-drive 4-T SRAM design for embedded systems. It can be fully implemented by a digital CMOS process. Meanwhile, a 2-stage ATD design is included to carry out the asynchronous access operation such that the power consumption is reduced. #### 5. REFERENCES - R. J. Baker, H. W. Li, and D. E. Boyce, "CMOScircuit design, layout, and simulation," Reading: IEEE Press, 1998. - [2] P. N. Glaskowsky, "MoSys explains 1T-SRAM technology," *Microprocessor Report*, vol. 13, no. 12, pp. 1-2, Sep. 1999. Reading: IEEE Press, 1998. - [3] H.-Y. Huang, and X.-Y. Su, "Low-power 2P2N SRAM with column hidden refresh," 2001 The 12th VLSI Design/CAD Symposium, C3-8, pp. 64, Aug. 2001. - [4] Y. Idei, K. Shimohigashi, M. Aoki, H. Noda, H. Iwai, K. Sato, and T. Tachibana, "Dual-period self-refresh scheme for low-power DRAM's with on-chip PROM mode registers," *IEEE J. Solid-State Circuits*, vol. 33, no. 2, pp. 253-259, Feb. 1998. - [5] B. Prince, "Semiconductor memories," Reading: John Wiley & Sons Ltd., 1991. - [6] K. Sato, K. Kenmizaki, S. Kubono, T. Mochizuki, H. Aoyagi, M. Kanamitsu, S. Kunito, H. Uchida, Y. Yasu, A, Ogishima, S. Sano, and H. Kawamoto, "A 4-Mb SRAM operating at 2.6±1 V with 3-μA data retention current," *IEEE J. Solid-State Circuits*, vol. 26, no. 11, pp. 1556-1562, Nov. 1991. - [7] K. Takeda, Y. Aimoto, N. Nakamura, H. Toyoshima, T. Iwasaki, K. Noda, K. Matsui, S. Itoh, S. Masuoka, T. Horiushi, A. Nakagawa, K. Shimogawa, and H. Takahashi, "A 16-Mb 400-MHz loadless CMOS four-transistor SRAM macro," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1631-1640, Nov. 2000. - [8] C.-C. Wang, and J.-J. Wang, "Address transition detector with high noise immunity," 2001 The 12th VLSI Design/CAD Symposium, C3-3, pp. 62, Aug. 2001. Figure 1: proposed 4-T SRAM cell Figure 2: side view of the proposed cell Figure 3: waveforms of the proposed cell and the NEC cell Figure 4: pre-discharging circuitry, SA, and the cell Figure 5: Stage 1 (the usual ATD design) Figure 6: Stage 2 of the proposed ATD Figure 7: ideal ATD waveforms for asynchronous operations Figure 8: simulation results of the proposed ATD Figure 9: overall cascaded ATD design Figure 10: the ATD response at the presence of an false-alarm signal $\,$ Figure 11: layout of the proposed SRAM Figure 12: post-layout simulation results