# A 1.0 GHZ CLOCK GENERATOR DESIGN WITH A NEGATIVE DELAY USING A SINGLE-SHOT LOCKING METHOD § Chua-Chin Wang<sup>†</sup>, Yih-Long Tseng, and Rong-Sui Kao Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 email: ccwang@ee.nsysu.edu.tw #### ABSTRACT A high-speed digital clock generator circuit is presented to provide negative delays in order to avoid a multi-locking hazard. The negative delay also results in small power consumption and shorter access time if the proposed circuit is used in the clock generator circuit of memory devices. Meanwhile, an accurately locked clock signal is also provided. The locked clock signal can be as high as 1.0 GHz at the presence of a random noise with 10% of power supply voltage while the design is implemented by 0.35 $\mu$ m CMOS 1P4M technology. Indexing terms : DLL, negative delay, multi-locking hazard, single-shot locking #### 1. INTRODUCTION As the speed performance of VLSI devices advances, suppressing the skew and jitter of the onchip clocks of the devices becomes a critical issue when it comes to the integration of microprocessor systems, memory interfaces, and communication systems. DLL (delay-locked loop) [1] is often adopted where frequency tracking is needed, e.g., clock recovery and frequency synthesis owing to a fact that the phase error of DLLs won't be accumulated in contrast of PLLs [4], [9]. Many wide-range DLL designs were proposed. The analog DLLs are difficult to port across different technologies due to its intrinsic complexity [5], [6]. The digital DLLs, by contrast, possess significant skew error and jitter [2], [3]. The dual-loop architecture which combines the previous two kinds of DLLs consumes excessive area and power [7], [8]. The most important of all is that none of the prior DLLs is focused on resolving the multi-locking phenomenon which often occurs at the DLLs, and leads unwanted power consumption, long lock time, and, even worse, oscillation. Meanwhile, the internal clock signal of a chip, e.g., memory, is generally obtained by delaying an external clock signal for a predetermined period time. the contact author Thus, when accessing the data of the memory using the internal clock signal, the access time will then be increased. We propose a novel digital clock generator using a single-shot locking method to generate an internal clock signal more rapidly than the given external clock, which outperforms the prior DLLs. In other words, a negative delay. The multi-locking problem and the long access time will be prevented. ## 2. CLOCK GENERATOR WITH A NEGATIVE DELAY In order to provide an accurate internal clock with a negative delay, a possible solution is to utilize a "causal" scheme. That is, a later pulse triggers or enables an earlier signal. #### 2.1. generation of negative delays Figure 1 shows the general architecture of the proposed design. $CLK_{ext}$ is the given external clock signal followed by $CLK_{in}$ with a buffer delay. Notably, the $CLK_{in}$ is also the synchronous clock to trigger the DFFs, i.e., DFF $_1$ ... DFF $_N$ . BUF $_1$ ... BUF $_M$ consists a tap line where all of the BUF $_i$ 's are identical. The size of the buffer is preferably tuned to make the period of $CLK_{ext}$ is a multiple times of the delay of the buffer. An enable signal, EN, is gated by a driving buffer to reset the DFFs. As shown in Figure 1, the $\overline{Q}$ of DFF<sub>i</sub> and the Q of DFF<sub>i+1</sub> is NORed and propagated through a control block, $CON_i$ , to control its corresponding tristated buffer $Tri_BUF_i$ of which the input is coupled to node $B_i$ in the tap line for all i, i = 1, 2, ..., N. Such a configuration makes the pulse generated by the output of the NOR gate, $NOR_i$ , of the $\overline{Q}$ of DFF<sub>i</sub> and the Q of DFF<sub>i+1</sub> is fed to the IN pin of a control block, $CON_i$ . The pulse generated the NOR gate will be allowed to pass to the output, OUT, of the control block $CON_i$ provided that the C is applied with a high signal. The OUT of the control block connects to the enable pin, PASS, of the $Tri_BUF_i$ . Then, the signal at $B_i$ , which is the $CLK_{in}$ delayed by k buffer delays, appears at the output of one corresponding $Tri_BUF_i$ . <sup>§</sup>This research was partially supported by National Science Council under grant NSC 89-2215-E-110-017 and 89-2215-E-110-014. $\mathbf{T}_i$ . Note that the k depends on how much delay that is pre-determined. For example, the pulse train at node B8 is earlier than those at node B11 and B12. However, it can only be delivered to the output, $CLK_{out}$ when those pulses at B11 and B12 enables the Tri\_BUF<sub>8</sub> through CON<sub>8</sub>. Figure 2 demonstrates an example of the generated waveforms of $CLK_{ext}$ , $CLK_{in}$ , $B_1 \dots B_{12}$ , $Tri\_BUF_8$ and $CLK_{out}$ . It is obvious that the pulse train at $B_8$ is leading the $CLK_{in}$ . According to this illustrative example, an earlier clock (e.g., $B_8$ ) is enabled and delivered to a latch by a lagging signal (e.g., the signal at node $B_8$ is actually triggered by the combination of the pulse trains at $B_{11}$ and $B_{12}$ ). The proposed design, thus, realize the negative delay by the "causal" scheme. #### 2.2. single-shot locking The fastest strategy to lock the external clock is to enable the solely locked pulse train while disable all of the rest pulse trains as soon as the locked pulse train is detected. Hence, the single-shot locking scheme is summarized as follows. - Every two adjacent DFFs are supposed to be triggered by two pulse trains which have a predefined delay therebetween. The complementary output of the first DFF is NORed with the output of the second DFF such that a narrow pulse will be generated. - The narrow pulse is sent the corresponding control block where C is the control signal to determined whether the narrow pulse is delivered to the PASS pin of a Tri\_BUF to enable it. - 3). If C=0, then output of the control block is shorted with the input. Hence, the narrow pulse is propagated to the Tri\_BUF. PASS = 1. At the same time, the C<sub>next</sub> is pulled low, which is signaled to the next control block to disable the output function of the next control block by setting its C=1. All of the rest of the control blocks are disabled sequentially by the same manner. - If C=1, then at least one pulse train has been locked. The control block is disabled which makes the output grounded, OUT = 0. - The output of the disabled control blocks is low, which in turn disables the following Tri\_BUF by PASS = OUT =0. Notably, the outputs of every two Tri\_BUFs are NORed, and all of the outputs of these NOR gates are NANDed together. Hence, no matter which pulse train is locked, it will be faithfully propagated to the node, $CLK_{out}$ . Since the proposed method is locked at the first appearance of a locked pulse train, the lock time is far less than that of any prior method, e.g., the min-max method. #### 2.3. estimation of the negative delay Assume the period of the external clock, $CLK_{ext}$ , is m ns, while a single BUF buffer creates a unit delay, i.e., 1 ns. Thus, the delayed clock at node $B_i$ has a phase lag of 1/m. The $\overline{Q}$ output of $DFF_i$ and the Q of $DFF_{i+1}$ are used to sample the clock signals at node $B_k$ and $B_{k+1}$ , where k > i, such that when the $\overline{Q}$ of DFF<sub>i</sub> remains low and the Q of $DFF_{i+1}$ turns low, the output of NOR gate, $NOR_i$ is pulled high and enable the corresponding Tri\_BUF<sub>i</sub> through $CON_i$ . Then, the clock at node $B_i$ is coupled to a latch. In short, the phase lag of such a design is determined by $\frac{i-k}{m}$ . The negative delay is estimated to be (i-k) ns. For instance, if an external clock is 125 MHz (period = 8 ns) and we need an internal clock with $-90^{\circ}$ phase, then i-k is chosen to be -2 provided that the delay of the buffers in the tap line is 1 ns. It is then concluded that we can generate any negative delay under two constraints. First, the delay of the tap line buffer is an integer factor of the external clock. Second, the phase difference can be pre-determined by the stage difference. The proposed design turns out to be a robust design method. #### 2.4. circuit of the Clock Generator The entire proposed design is composed of the following modules according to Figure 1. tap line: It consists of a series of identical buffers. The delay of the buffers is an integer factor of the period of the externally given clock, which ensures the correctness of the output clock. sampling DFFs: The DFFs acquire individual delayed clock signals from the tap line after the EN is pulled high. A NOR gate, in the mean time, monitors the $\overline{Q}$ of the previous stage and the Q of the next stage. When both of them turn low, a high signal is provided to the IN pin of the corresponding control block, i.e., $CON_i$ . If the state of the C of the control block is 0, then it enables the following $Tri_BUF$ , to enable an earlier clock with a pre-determined delay. At the same time, a signal $C_{next}$ is asserted to disable the rest of the control blocks. By contrast, if the state of the C is 1, the control block is disabled. The OUT is set to zero. **output module:** This module consists of NOR gates, an NAND gate and a driving buffer. The locked pulse train will be solely propagated to the output through these combinatorial logic gates. In addition to the above modules, a testing module is required to test whether the proposed design really locks an external clock as high as 1.0 GHz. Figure 3 shows a test module associated with our proposed clock generator. The test modules is composed of 4 clock generators which are able to 400 MHz, 600, MHz, 800 MHz, and 1.0 GHz clock pulses, respectively. One 4-to-1 MUX is used to select one of the four clocks to be the testing external clock. Notably, The clock generators are composed of odd-numbered cascaded inverters. #### 3. SIMULATION AND ANALYSIS The proposed clock generator is implemented by TSMC 0.35 $\mu$ m 1P4M CMOS technology. Figure 4 is the schematic view directly snapshot in the CADENCE tool, where N=9 and M=12. Figure 5 is the layout of the chip design. Figure 6, 7, 8, respectively, shows the post-layout simulation results by HSPICE when the external clock are 600 MHz, 800 MHz, and 1.0 GHz with a 0.3 V added random noise signal. In the 600 MHz scenario, we still get a nearly perfect locked clock signal with a delay of -90°. By contrast, the negative delay in the 1.0 GHz scenario is -60° with 6.0 ns lock time. The overall average power dissipation is 0.1569 W @ 1.0 GHz. The negative delay is verified. Figure 9 shows the die photo of the proposed design on silicon. #### 4. CONCLUSION We present a novel digital clock generator with a negative delay. Such a clock generator can be widely used in the memory interface to reduce the access time. Besides, the multi-locking syndrome of conventional DLLs is also prevented. The lock time is drastically small and the maximal locking frequency is 1.0 GHz. The simulation results turn out to be very appealing. ### 5. REFERENCES - R. J. Baker, H. W. Li, and D. E. Boyce, "CMOS circuit design, layout, and simulation," Reading: IEEE Press, 1998. - [2] A. Efendovich, and Y. Afek, C. Sella, and Z. Bikowsky, "Multifrequency zero-jitter delaylocked loop," *IEEE J. Solid-State Circuits*, vol. 29, pp. 67-70, Jan. 1994. - [3] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Leen, and M. A. Horowitz, "A portable digital DLL for high-speed CMOS interface circuits," *IEEE J. Solid-State Circuits*, vol. 34, pp. 632-644, May 1999. - [4] T. H. Lee, and J. F. Bulzacchelli, "A 155-MHz clock recovery delay- and phase-locked loop," - *IEEE J. Solid-State Circuits*, vol. SC-27, pp. 1736-1746, Dec. 1992. - [5] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and T. Ishikawa, "A 2.5 V CMOS delay-locked loop for 18 Mbit, 500 Megabyte/s DRAM," *IEEE J. Solid-State Circuits*, vol. 29, pp. 1491-1496, Dec. 1994. - [6] Y. Moon, J. Choi, K. Lee, D.-K. Jeong, and M.-K. Kim, "An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and low-jitter performance," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 377-384, Mar. 2000. - [7] S. Sidiropoulos, and M. A. Horowitz, "A semidigital dual leday-locked loop," *IEEE J. Solid-State Circuits*, vol. 32, pp. 1683-1692, Nov. 1997. - [8] S. Tanoi, T. Tanabe, K, Takahashi, S. Miyamoto, and M. Uesugi, "A 250-622 MHz deskew and jitter-suppressed clock buffer using two-loop architecture," *IEEE J. Solid-State Circuits*, vol. 31, pp. 487-493, Apr. 1996. - [9] C.-C. Wang, Y.-T. Chien, and Y.-P. Chen, "A practical load-optimized VCO design for lowjitter 5V 500 MHz digital phase-locked loop," 1999 Inter. Symp. on Circuits & Systems, vol. II, pp. 528-531, June 1999. Figure 1: General architecture of the proposed clock generator Figure 2: Pulse waveforms generated at the nodes of the tap line Figure 3: Proposed design and the test module Figure 4: Schematic of the proposed clock generator (Snapshot in the CANDENCE) Figure 5: Layout of the chip Figure 6: Post-layout HSPICE result ( $CLK_{ext}=600$ MHz with 0.3 V noise) Figure 7: Post-layout HSPICE result ( $CLK_{ext}=800$ MHz with 0.3 V noise) Figure 8: Post-layout HSPICE result ( $CLK_{ext}=1000$ MHz with 0.3 V noise) Figure 9: Die photo of the proposed clock generator