# A PHASE-ADJUSTABLE NEGATIVE PHASE SHIFTER USING A SINGLE-SHOT LOCKING METHOD<sup>§</sup>

Chua-Chin Wang<sup>†</sup>, Ya-Hsin Hsueh, Sen-Fu Hong, & Rong-Sui Kao<sup>‡</sup>

Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 email : ccwang@ee.nsysu.edu.tw

## ABSTRACT

A digital negative phase shifter circuit is present to provide negative delays (phase shift) in order to avoid multi-locking hazards. It can adjust the negative phase by using multiplexers and voltage variable delay cells to select the required phase shift. The design is implemented by 0.35  $\mu$ m CMOS 1P4M technology. A single-shot locking method is adopted to reduce the locking time. Most important of all, the negative phase shift is predictable and adjustable. The simulation results show that the accuracy of the proposed design is better than 6%.

Key words : negative delay, multi-locking hazard, singleshot locking, phase shift

## 1. INTRODUCTION

DLLs (delay-locked loop) [1] are often adopted where frequency tracking is needed, e.g., clock recovery and frequency synthesis owing to a fact that the phase error of DLLs won't be accumulated in contrast of PLLs [6]. Many wide-range DLL designs were proposed. The analog DLLs are difficult to port across different technologies due to its intrinsic complexity [3]. The digital DLLs, by contrast, possess significant skew error and jitter [2]. The dualloop architecture which combines the previous two kinds of DLLs consumes excessive area and power [4]. The most important of all is that none of the prior DLLs was focused on resolving the multi-locking phenomenon which often occurs at the DLLs, and leads unwanted power consumption, long lock time, and, even worse, oscillation. The basic reason is that the internal clock signal of a chip, e.g., memory, is generally obtained by delaying an external clock signal for a predetermined period time. Thus, when accessing the data of the memory using the internal clock signal, the access time will then be increased. We propose a novel digital phase-adjustable negative phase shifter using

a single-shot locking method to generate an internal clock signal more rapidly than the given external clock, which outperforms the prior DLLs regarding the elimination of multi-locking hazards and tolerate the input signal jitter. The phase of output can be adjusted to meet the usage of given applications.

## 2. A PHASE-ADJUSTABLE NEGATIVE PHASE SHIFTER

In order to provide an accurate internal clock with a negative delay (phase) [8], a possible solution is to use a later pulse to trigger an earlier signal.

## 2.1. Generation of negative delays

Figure 1 shows the general architecture of the proposed design.  $CLK_{in}$  is the given external clock signal. Notably, the  $CLK_{in}$  is also the synchronous clock to trigger the DFFs, i.e., DFF<sub>1</sub>...DFF<sub>N</sub>. BUF<sub>1</sub>..., and BUF<sub>M</sub> consist a tap line where all of the BUF<sub>i</sub>'s are identical. The size of the buffer is preferably tuned to make the period of  $CLK_{in}$  is a multiple times of the delay of the buffer. An enable signal, EN, is gated by a driving buffer to reset the DFFs [8].

As shown in Figure 1, the  $\overline{Q}$  of DFF<sub>i</sub> and the Q of  $DFF_{i+1}$  is NORed and propagated through a control block, CON<sub>i</sub>, to control its corresponding tri-stated buffer Tri\_BUF<sub>i</sub> of which the input is coupled to node  $B_i$  in the tap line for all i, i = 1, 2, ..., N. Such a configuration makes the pulse generated by the output of the NOR gate, NOR<sub>*i*</sub>, of the  $\overline{Q}$  of DFF<sub>*i*</sub> and the Q of DFF<sub>*i*+1</sub> is fed to the IN pin of a control block,  $CON_i$ . The pulse generated by the NOR gate will be allowed to pass to the output, OUT, of the control block  $CON_i$  provided that the pin of  $CON_i$ , C, is applied with a high signal. The OUT of the control block connects to the enable pin, PASS, of the Tri\_BUF<sub>i</sub>. Then, the signal at  $B_i$ , which is the  $CLK_{in}$  delayed by k buffer delays, appears at the output of one corresponding Tri\_BUF,  $T_i$ . Note that the k depends on how much delay that is pre-determined. For example, the pulse train at node  $B_3$  is earlier than those at node  $B_6$  and  $B_7$ . How-

<sup>&</sup>lt;sup>§</sup> This research was partially supported by National Science Council under grant NSC 92-2220-E-110-001 and NSC 92-2220-E-110-004.

<sup>&</sup>lt;sup>†</sup> The contact author, ccwang@ee.nsysu.edu.tw

<sup>&</sup>lt;sup>‡</sup> Rong-Sui Kao is currently working as an IC design enginer in VIA Technologies, Taipei, Taiwan.

ever, it can only be delivered to the output,  $CLK_{out}$ , when those pulses at  $B_6$  and  $B_7$  enables the Tri\_BUF<sub>4</sub> through CON<sub>4</sub>. Figure 2 demonstrates the example of the generated waveforms of  $CLK_{in}$  and  $B_1 \dots B_7$ . It is obvious that the pulse train at  $B_3$  is leading the  $CLK_{in}$ . According to this illustrative example, an earlier clock (e.g.,  $B_3$ ) is enabled and delivered to a latch by a lagging signal (e.g., the signal at node  $B_3$  is actually triggered by the combination of the pulse trains at  $B_6$  and  $B_7$ ).

## 2.2. Single-shot locking

The fastest strategy to lock the external clock is to enable the solely locked pulse train while disable all of the rest pulse trains as soon as the locked pulse train is detected. Hence, the single-shot locking scheme is summarized as follows.

- Every two adjacent DFFs are supposed to be triggered by two pulse trains which have a pre-defined delay therebetween. The complementary output of the first DFF is NORed with the output of the second DFF such that a narrow pulse will be generated.
- The narrow pulse is sent to the corresponding control block where the pin C is the control signal to determine whether the narrow pulse is delivered to the PASS pin of a Tri\_BUF to enable it.
- 3). If C=0, then output of the control block is shorted to the input. Hence, the narrow pulse is propagated to the Tri\_BUF. That is, PASS = 1. At the same time, the  $C_{next}$  is pulled low, which is signaled to the next control block to disable the output function of the next control block by setting its C=1. All of the rest of the control blocks are disabled sequentially by the same manner.
- If C=1, then at least one pulse train has been locked. The control block is disabled which makes the output grounded, OUT = 0.
- 5). The output of the disabled control blocks is low, which in turn disables the following Tri\_BUF by PASS = OUT =0.

The outputs of every two Tri\_BUFs are NORed, and all of the outputs of these NOR gates are NANDed together. Hence, no matter which pulse train is locked, it will be faithfully propagated to the node,  $CLK_{out}$ . Since the proposed method is locked at the first appearance of a locked pulse train, the lock time is far less than that of any prior method, e.g., the min-max method [5].

#### 2.3. Estimation of the negative delays

Assume the period of the  $CLK_{in}$  is m ns, while a single BUF buffer creates a unit delay d ns. Thus, the delayed clock at node  $B_i$  has a phase lag of  $\frac{d}{m} \cdot i$ . The  $\overline{Q}$ output of DFF<sub>i</sub> and the Q of DFF<sub>i+1</sub> are used to sample the

clock signals at node  $B_k$  and  $B_{k+1}$ , where k > i, such that at the *h*th clock of  $CLK_{out}$ , where i < h < k + 1, the  $\overline{Q}$ of DFF<sub>i</sub> remains low and the Q of DFF<sub>i+1</sub> turns low. The output of NOR gate,  $NOR_i$ , is pulled high and enable the corresponding Tri\_BUF<sub>i</sub> through  $CON_i$ . Then, the clock at node  $B_i$  is coupled to a latch. In short, the phase lag of such a design is determined by  $\frac{(i-k) \cdot d}{m}$  regardless where the pulse is locked in the delay chain. The negative delay is estimated to be  $((i - k) \cdot d)$  ns. However, the overall delay of Tri\_BUF, NOR, NAND gates and the final output buffer, namely "pass process" delay, reduces the total negative delay. The output of  $NOR_i$  is pulled high, when the rise edge of  $CLK_{in}$  between  $B_k$  rise edge (becoming "1"),  $B_{k+1}$ ="0", and  $B_{k+1}$  rise edge (maintaining "0"). Therefore, it can be locked in this period even though the  $CLK_{in}$ has jitters.

The value of (i - k) is based on the position of the rise edge of  $CLK_{in}$ . In the prior example:

- a). When the rise edge of  $CLK_{in}$  is close to  $B_k$  rise edge (becoming "1"),  $B_{k+1}$ ="0"  $\Rightarrow (i k) = -3$
- b). When the rise edge of  $CLK_{in}$  is close to  $B_{k+1}$  rise edge (maintaining "0")  $\Rightarrow (i k) = -4$

Hence, the final phase shift of  $CLK_{out}$  is

$$delay_{(CLK_{in} vs. CLK_{out})}$$
  
=(((i - k) · d) - (-delay\_{(pass process)})),

phase shift= $\frac{delay_{(CLK_{in}vs.CLK_{out})} - j \cdot m}{m} \times 2\pi$ , where j=0,1,2... For instance, if an external clock is 400 MHz (period = 2.5 ns), (i - k) is -4,  $delay_{(pass process)}$  is 1.68 ns, d is 0.27 ns, phase shift is predicted to be  $\frac{((-4) \cdot 0.27) + 1.68 - 2.5}{2.5} \times 2\pi = \frac{-1.9}{2.5} = -273.6^{\circ}$ .

#### 2.4. Phase Adjustment

There are two circuit designs to adjust the required negative phase of  $CLK_{out}$ .

- delay buffer: The delay of the buffer is adjusted using the technique of "voltage variable delay line" [7], as shown in Figure 3. The delay of the cell is tunable by change the voltage, Vctrl. When Vctrl goes up, the delay increases.
- 2). barrel shifter: It can select -6 to 0 buffer delays for (i k). Figure 4 is the comparison of the phase shift between estimation and simulation given a 600 MHz input clock.

### 3. SIMULATIONS AND CHIP TESTING

The proposed phase shifter is implemented by TSMC (Taiwan Semiconductor Manufacturing Company)  $0.35 \,\mu m$  1P4M CMOS technology. Figure 5 shows the post-layout



Fig. 1. General architecture of the proposed phase shifter

simulation result by HSPICE when the external clock is 600 MHz at the FF mode, 60°C, VDD=3.63V. Figure 6 shows the die photo of the proposed design. The overall average power dissipation is 261.84 mW at 600 MHz. The specification of the propose negative phase shifter is tabulated in Table 1.

| negative phase range | $[-313.1^{\circ}, -44.7^{\circ}]$ |
|----------------------|-----------------------------------|
| power                | 261.84 mW                         |
| max. clock           | 600MHz                            |
| core area            | $3.24 \text{ mm}^2$               |
| transistor count     | 1162                              |
| accuracy             | $\leq 5.4\%$                      |

Table 1. The specification of the negative phase shifter

## 4. CONCLUSION

We present a novel phase-adjustable negative phase shifter. Such a phase shifter can be widely used in the memory interface to reduce the access time. Besides, the multi-locking syndrome of conventional DLLs is also prevented. The lock time is drastically small and the maximal locking frequency is 600 MHz. The simulation results turn out to be very appealing.

## 5. REFERENCES

 R. J. Baker, H. W. Li, and D. E. Boyce, "CMOS - circuit design, layout, and simulation," Reading : IEEE Press, 1998.

- [2] A. Efendovich, and Y. Afek, C. Sella, and Z. Bikowsky, "Multifrequency zero-jitter delay-locked loop," *IEEE J. Solid-State Circuits*, vol. 29, pp. 67-70, Jan. 1994.
- [3] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and T. Ishikawa, "A 2.5 V CMOS delaylocked loop for 18 Mbit, 500 Megabyte/s DRAM," *IEEE J. Solid-State Circuits*, vol. 29, pp. 1491-1496, Dec. 1994.
- [4] S. Sidiropoulos, and M. A. Horowitz, "A semi-digital dual delay-locked loop," *IEEE J. Solid-State Circuits*, vol. 32, pp. 1683-1692, Nov. 1997.
- [5] J.-G. Lee, S.-M. Park, "Clock signal modeling circuit with negative delay," R.O.C. (Taiwan) Patent no. 430797, Apr. 21, 2001.
- [6] S. Kim, and M. Soma, "An all-digital built-in selftest for high-speed phase-locked loops," *IEEE Trans.* of Circuits and Systems, Part II : Analog and Digital Signal Processing, vol. 48, no. 2, pp. 141-150, Feb. 2001.
- [7] R. J. Baker, H. W. Li, D. E. Boyce, *Cmos circuit design, layout, and simulation, New York: Wiley-Interscience*, 1997.
- [8] C.-C. Wang, and R.-S. Kao, "A 1.0 GHz clock generator design with a negative delay using a singleshot locking method," in *Proc. ICECS2001*, pp. 1123-1126, Sep. 2001.



Fig. 4. Phase shift between estimation and simulation (TT mode, 25°C, 3.3V)



Fig. 2. Pulse waveforms generated at the nodes of the tap line



**Fig. 5.** Post-layout HSPICE result ( $CLK_{ext} = 600$  MHz, 3.63 V, 60°C, FF mode, and the negative shift is  $-263.6^{\circ}$ )



Fig. 3. The voltage variable delay buffer



Fig. 6. Die photo of the proposed phase shifter