# A Self-disable Sense Technique with Differential NAND Cell for Content-Addressable Memories

Chi-Chun Huang, Jun-Han Wu, and Chua-Chin Wang, Senior Member, IEEE

Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 email: ccwang@ee.nsysu.edu.tw

Abstract—A self-disable sensing technique for content-addressable memories (CAM) is presented in this work. The proposed differential match-line sense amplifier can be self disabled to choke the charge current fed into the match line right after the comparison result is generated. Instead of using typical NOR/NAND CAM cells with the single ended match-line, a novel NAND CAM cell with the differential match-line can increase the speed of comparison without sacrificing the power consumption. Besides, the 13-T CAM cell provides the complete write, read, and comparison functions to refresh the database and verify its correctness before searching. The CAM with the proposed technique is realized to justify the performance by using a 0.13  $\mu m$  CMOS process. The average energy consumption of the searching process is 1.584 fj/bit/search.

Keywords—content-addressable memories, match-line, sense amplifier, modified NAND CAM cell.

#### I. INTRODUCTION

The data searching is always an important issue in a database or networking system. Either the looking up table applications or identification matching systems dissipate a lot of energy to compare the input data with the data in the database. Many methods of comparing the input data with the data in the database, software- or hardware-basedly, have been proposed. The content-addressable memory (CAM) is considered as a better solution than others owing to the price and speed. The CAM is a storage device composed of an addressed standalone memory with a matching access device in each cell. Thus, the input data can be compared with all of the storage data in the memory simultaneously. The output data of CAM is the address of the data matched with the input data. The address searching process is expected to generate the result in a single clock cycle, regardless of the size of the database.

Fully parallel comparison architecture of CAM undoubtedly causes large power consumption. The major power dissipation is caused by charging and discharge of the match-line (ML). To reduce the power consumption, many techniques have been proposed. The simplest way is to adopt the NAND-type CAM cell due to its serial ML switch structure [1]. However, the searching speed is extremely poor. The detailed analysis will be discussed later in the following text. On the

other hand, several high-speed low power CAM techniques have been proposed based on the NOR-type cell. For example, [2] restricts the charging voltage of the MLs to limit the matching power consumption. [3] employs the pre-computation method CAM to reduce the number of the activated MLs. [4] and other mismatch-depending techniques allocate less power to the mismatched MLs to reduce the static power. These techniques efficiently reduce the power consumption during the searching process and keep the searching speed as well. However, these techniques have a performance limitation, which is the scenarion caused by match or mismatch of the dummy word circuit. There is unavoidable power consumption during such a searching process after the comparison result is generated. Therefore, the power can be further reduced by choking the charging current as soon as the comparison is finished. Moreover, most of the prior CAM cell structures lack of the read-out circuit to verify the correctness of the database before searching.

# II. CAM USING SELF-DISABLE SENSING

The simplified CAM architecture is shown in Fig. 1. The I/O of the CAM is composed of the *n*-bit data word for search (Search Data) and the address of the word to match with the Search Data. The database is distributively stored in the CAM cells (C in Fig. 1). The Search Word Register loads the Search Data and feeds forward to every CAM cell. Each of the match line sense amplifier (MLSA) then charges the ML and senses the voltage variation to tell if any word is the same as the Search Data. Usually, there is only one word will match with the Search Data to trigger the Address Encoder to output an address code after the searching process.

## A. 13T CAM cell with write/read function

The proposed 13-T CAM cell is shown in Fig. 2. It is composed of a typical 6-T SRAM cell, a matching access device, and a decoupled read-out circuit. The matching access device is a switch of the current path between ML and SML. The details are given in the following subsection. To verify the correctness of the database stored in the CAM cell before the searching process, the read-out circuit is necessary. The reason is that if the system supply voltage is dropped to reduce power

consumption, many overheads will be introduced, including the reduced static noise margin (SNM), poor writablility, limited number of cells per bit-line, and reduced bit-line sensing margin of 6-T SRAM. Therefore, the decoupled readout circuit is needed [5] to reduce the mentioned problems. Before the read process, the RWL is logic 0 to prechage the node RL through MRP. During the read process (RWL=1), the voltage of the RL depends on the logic of QB. If the RL is not discharged through MR2 and MR3, the logic 1 of RL can be sensed by RBL via MR1. Thus, the SNM during read process could be retained such that the cell node is decoupled from the RBL. Thanks to the maintainability of SNM, the bit-line sensing margin of the read-out function, and the differential match line sense amplifier, the supply voltage of the proposed CAM can be rduced more than 20% to save power consumption.

## B. Differential NAND CAM cell

Traditionally, there are two typical types of CAM cell, which are NAND type and NOR type, shown in Fig. 3 and Fig. 4, respectively. The block M is the memory cell to store the data bit acting as a RAM cell. The M block could be any type of memory cell. In the proposed design, it is composed of two cross-coupled inverters like the SRAM cell. Referring to Fig. 3, the search word supplied by the search-line (SL,  $\overline{\text{SL}}$ ) will be compared with the bit in M. In the NAND type cell, if a "match" occurs (Q=SL), the  $MA_i$  is turned ON to pass the signal horizontally across the NAND CAM array. Thus, if any bit is mismatch, the path composed of serial MAis will be cutoff. By contrast, in Fig. 4, there is a path from ML\_O to ground in a single NOR CAM cell if there is a "mismatch". Thus, the ML\_O is isolated from ground only if all bits in the same word are matched. Moreover, no matter what the condition of MLs is, the MLSAs sense the voltage change on the MLs to resolve the comparison result of each word. Obviously, the charge/discharge path of the NAND-type CAM word is only one serial path. The power consumption, therefore, is low but the speed will be sacrificed. By contrast, the pulldown paths in the NOR-type CAM are all parallel such that the speed is fast but the power consumption is very large.

Therefore, the proposed CAM cell in Fig. 5 is used to resolve such a design dilemma.  $MS_i$  is turned ON only if SL and Q are logically different. The SML will then be charged by ML unless every bit of the word is the same as the corresponding bit of search data. The voltage drop between ML and SML will be sensed by the differential match line sense amplifier. The speed of the comparison will be increased by parallel charging paths, but the DC grounding path is removed to reduce the static power consumption.

## C. Differential Match Line Sense Amplifier

There are two typical ways to design the match line sense amplifier (MLSA): precharging before comparison or charging during comparison. MLSA with the precharging circuit charges the match line (ML) or the capacitor before comparison and then sense the voltage drop on ML to generate the comparison result. This kind of MLSAs can easily control the power consumption by limiting the energy precharged in the precharging circuit. However, the precharging design is suffered from the charge sharing effect among CAM cells. The other type of MLSA charges the ML during the searching process. This type of MLSAs can speed up the searching process by increasing the charging current. However, the penalty is that the charging current will consume additional power until the end of searching process, even if the match result has been determined.

Therefore, a novel differential match line sense amplifier (DMLSA) is proposed in our CAM. Referring to Fig. 6, the DMLSA can sense the voltage on the ML and SML to tell if the word is "match" or not and then disable automatically the charge path to save the power. Refer to Fig. 6 and Fig. 7, Before the searching process, a reset circuits (not shown) drives the DMLSA into the initial state, ML=SML=0, SP=0. In the beginning of the searching process, SEARCH\_EN=SEARCH is pulled high. Then, MN1 is turned ON to charge the ML so that the voltage at KP will be reduced but not pulled down to 0. If there is any "mismatch" CAM cell, the current path between ML and SML is short-circuited to charge SML with ML. The voltage of KP will be pulled down and make the MISMATCH logic 1, indicating the comparison result of the word = "mismatch". The high voltage level (logic=1) of MISMATCH turns the MN3 ON and the MP1 OFF. The former constitutes a positive loop from MISMATCH to KP through MN3 and MN2, which pulls down KP more quickly. The latter can choke the charge current supplied to ML, since MISMATCH=1. On the contrary, as shown in Fig. 8, if all of the CAM cells are "match", there is no current path between ML and SML. The voltage difference between ML and SML creates an output current of the differential pair (MP2 and MP3) to charge the KP and SP. As soon as KP is charged to high level, MISMATCH becomes logic 0, indicating that the comparison is "match". After the SP is raised to a high level, SEARCH will be logic 0 and turn off the MN1 to choke the charge current to ML. Therefore, the charge current to ML will be choked after the result of comparison has been decided regardless what the result is.

## III. SIMULATION AND IMPLEMENT

To justify the performance of the proposed techniques, a CAM with 128 words  $\times$  32 bits a word has been designed. TSMC (Taiwan Semiconductor Manufacturing Company) 0.13  $\mu$ m 1P6M CMOS process is adopted to carry out the proposed CAM chip. The post-layout simulation result of the DMLSA is shown in Fig. 7 when the result of the word comparison is "mismatch", and Fig. 8 shows simulation result of "match". The Search Time is shorten by buffer for the MISMATCH signal. The comparison of the proposed CAM design with our previous design is shown in Table I. The Energy for Search is the average energy dissipation for comparing 1 bit. The proposed technique has the best

performance even the value is normalized with supply voltage square. Besides, the normalization of Average Power is the original value divide by frequency and supply voltage square. Fig. 9 is the layout of proposed DMLSA with 13-T CAM cell.

### IV. CONCLUSION

We have proposed a self-disable sensing technique for CAM. The self-disable circuit chokes the charging current to ML as soon as the comparison has been done in order to reduced the additional power dissipation. The differential NAND CAM cell using SML instead of grounding path can further save the power compared with prior NOR or NAND cells. Besides, the decoupled read-out circuit makes the database can be verified before searching, even if the supply voltage is reduced for the sake of power saving.

### ACKNOWLEDGMENT

This research was partially supported by National Science Council under grant NSC 96-2923-E-110-001 and NHRI-EX97-9732EI. The authors would like to thank CIC of National Science Council (NSC), Taiwan, for their thoughtful help in the chip fabrication of the proposed work. The authors also like to thank "Aim for Top University Plan" project of NSYSU and Ministry of Education, Taiwan, for partially supporting the research.

#### REFERENCES

- [1] B. D. Yang, and L. S. Kim, "A low-power CAM using pulsed NAND-NOR match-line and charge-recycling search-line driver," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 736-1744, Aug. 2005.
- [2] H. Miyatake, M. Tanaka, and Y. Mori, "A design for high-speed low-power CMOS fully parallel content-addressable memory macros," *IEEE J. Solid -State Circuits*, vol. 36, no. 6, pp. 956-968, Jun. 2001.
- [3] C.-S. Lin, J.-C. Chang, and B.-D. Liu, "A low-power precomputation-based fully parallel content-addressable memory," *IEEE J. Solid -State Circuits*, vol. 38, no. 4, pp. 654-662, Apr. 2003.
- [4] I. Arsovski, and A. Sheikholeslami, "A mismatch-dependent power allocation technique for match- line sensing in content-addressable memories," *IEEE J. Solid -State Circuits*, vol. 38, no. 11, pp. 1958-1966, Nov. 2003.
- [5] T. H. Kim, J. Liu, J. Keane, and C. H. Kim, "A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing," *IEEE J. Solid -State Circuits*, vol. 43, no. 2, pp. 518-529, Feb. 2008.
- [6] K.-H. Cheng, C.-H. Wei, and S.-Y. Jiang, "Static divided word matching line for low-power content addressable memory design," in Proc. of IEEE Int. Symp. on Circuits and Systems, vol. 2, pp. 629-632, May 2004.



Fig. 1. Simplified CAM architecture.

|                                                       | [1]                   | [6]                        | [3]                       | ours                    |
|-------------------------------------------------------|-----------------------|----------------------------|---------------------------|-------------------------|
| CMOS Process<br>Supply Voltage (V)<br>Frequency (MHz) | 0.25 μm<br>2.5<br>260 | 0.25 μm<br>1.5, 2.5<br>300 | 0.35 μm<br>1.5~3.3<br>100 | 0.13 μm<br>1~1.2<br>330 |
| Search Time (ns)  Energy for Search (fJ/bit/search)   | 3.8<br>1.72<br>@2.5 V | 4.7, 2.1<br>13.9<br>@2.5 V | 4.5<br>86<br>@3.3 V       | 0.9<br>1.584<br>@1 V    |
| EfS (normalized)                                      | 2.752                 | 2.224                      | 7.987                     | 1.584                   |
| Average Power (mW)                                    | N/A                   | 17.12<br>@300 MHz          | 33<br>@100 MHz            | 7.041<br>@330 MHz       |
| Average Power (normalized)                            | N/A                   | 0.01                       | 0.03                      | 0.02                    |

 $\begin{tabular}{l} TABLE\ I \\ Comparison\ of\ the\ CAM\ technique. \end{tabular}$ 



Fig. 2. The proposed 13-T CAM cell.



Fig. 3. Typical NAND CAM cell.



Fig. 4. Typical NOR CAM cell.



Fig. 7. Post-layout simulation results of DMLSA while 1-bit mismatch.



Fig. 5. Differential NAND CAM cell.



Fig. 8. Post-layout simulation results of DMLSA while all match.



Fig. 6. The differential match-line sense amplifier.



Fig. 9. Layout of proposed DMLSA with 13-T CAM cell. A: 6-T SRAM cell, B: matching access device, C: decoupled read-out circuit, and D: proposed DMLSA