Received 4 August 2025, accepted 20 August 2025, date of publication 25 August 2025, date of current version 8 September 2025. Digital Object Identifier 10.1109/ACCESS.2025.3602093 # **RESEARCH ARTICLE** # **Slow-Slope Reset Scheme for Highly-Sensitive CMOS Integrate-and-Dump Receiver OEIC** SIMON MICHAEL LAUBE<sup>®</sup>, CHRISTOPH GASSER<sup>®</sup>, KERSTIN SCHNEIDER-HORNSTEIN<sup>®</sup>, AND HORST ZIMMERMANN<sup>®</sup>, (Senior Member, IEEE) Institute of Electrodynamics, Microwave, and Circuit Engineering, Technische Universität Wien, 1040 Vienna, Austria Corresponding author: Simon Michael Laube (simon.laube@tuwien.ac.at) This research was funded in whole or in part by the Austrian Science Fund (FWF) grant DOI: 10.55776/P34649. For open access purposes, the authors have applied a CC BY public copyright license to any author accepted manuscript version arising from this submission. The authors acknowledge TU Wien Bibliothek for financial support through its Open Access Funding Programme. **ABSTRACT** This paper presents a slow-slope reset scheme that reduces charge injection for highly-sensitive integrate-and-dump direct detection receivers. The monolithic receiver OEIC utilizes a source-follower front-end and low-capacitance PIN photodiode, to achieve high sensitivity at higher data rates (250 Mbit/s) than previous ultra-sensitive PIN receivers. Both, the slow-slope and classical rectangular reset scheme, are fabricated with the same front-end, on the same wafer, in 180 nm high-voltage CMOS. The measured transient voltages are in agreement with theory and suggest effective mitigation of charge injection by the slow-slope reset. Using correlated double sampling (CDS), our improved receiver achieves a sensitivity of –47.0 dBm at 250 Mbit/s with 50% return-to-zero (RZ) on-off keying (OOK) modulation and –53.5 dBm at 100 Mbit/s with 80% RZ OOK modulation, both for the reference bit error probability (BER) of 0.002 and wavelength 642 nm. The difference to the shot noise quantum limit at 250 Mbit/s (100 Mbit/s) is 19.7 dB (17.2 dB). In addition, we show that low charge injection enables single sampling, with sensitivities around 1 dB worse than CDS. **INDEX TERMS** Integrate-and-dump, charge injection, clock feedthrough, dummy switch, slow-slope reset, p-i-n photodiode, correlated double sampling (CDS), quantum limit, direct detection, CMOS. #### I. INTRODUCTION Direct detection (DD) optical receivers with PIN photodiodes (PDs) are usually notorious for inferior sensitivity [1], [2]. Fundamentally limited by the noise of the front-end circuit, their sensitivity is typically far from the shot noise quantum limit (QL). In the past, avalanche photodiodes (APDs) [3], [4], [5], [6] and single-photon avalanche diodes (SPADs) [7], [8], [9], [10], [11], [12], [13] have been used to treat this problem. These PDs provide photocurrent amplification via the avalanche effect to relax the requirements of the front-end circuit. SPADs deliver the best sensitivities, with reported results that are 11 dB above the QL [10], [11], [12]. However, impairments such as dead-time, dark count rate (DCR), and after-pulsing probability (APP) increase SPAD circuit complexity and chip area compared to PIN receivers. In particular, the excellent sensitivity of [10] and [11] is achieved by a 64 × 64 SPAD array, with a quenching The associate editor coordinating the review of this manuscript and approving it for publication was Stanley Cheung. circuit within each pixel. In [13], a discrete SPAD array (SiPM, silicon photomultiplier) was cooled to $-10\,^{\circ}\text{C}$ to reduce DCR, in order to achieve a useful bit error probability (BER). However, the relative simplicity of PIN DD receivers motivated us for further research and optimization. High quantum efficiency, low-capacitance dot-cathode PIN PDs [14], [15], [16] enable monolithic PIN receivers [16], [17], [18], [19] with sensitivities in the range of SPAD receivers. Matched filter integrator front-ends based on parasitic capacitance are key for the noise performance of these receivers. The integrate-and-dump (I&D) topology [20], [21] or DC input current compensation (ICC) [22], [23], [24], [25] is utilized to control the integrator operating point. ICC may deteriorate sensitivity, because it requires many transistors in the feedback loop to the input node. In addition, ICC cannot tolerate long sequences of identical bits, since the feedback loop introduces a lower cutoff frequency. Reference [19] achieved 19.2 dB distance to the QL with an ICC inverter front-end and DC-balanced Manchester encoding. The encoding halves the data rate, but avoids long identical bit sequences. In contrast, I&D avoids the problem of long sequences by resetting the front-end after each bit. I&D requires less transistors and may improve noise performance. However, switching introduces random offsets (kT/C) noise) that are a major factor for low-capacitance frontends. Correlated double sampling (CDS) [26] equalization is typically applied to remove kT/C noise, but increases complexity compared to ICC. The inverter front-ends with CDS (in post-processing) in [16] and [18] achieved a distance of 18.4 dB and 21.2 dB to the QL, respectively. Reference [17] reports a CDS common-source (CS) front-end with a sensitivity of 17.3 dB above QL. A practical problem of I&D is charge injection to the frontend input, due to the reset switch turn-off [20]. The demanded sensitivity to signal-induced charges implies unwanted sensitivity to injected charges. Charge injection is well-studied [27], [28], [29], [30], [31], [32], but compensation based on dummy switches [27], [28], [33] with rectangular gate voltage still resulted in significant charge injection in previous I&D receivers [16], [17], [18]. Out of these three receivers, [16] shows the least amount of charge injection, because voltage effects and clock timing [33] were addressed. Specifically, the size of the dummy transistors was tuned to the operating point, and the gate voltage delay was adjusted via capacitive loading. Since both are based on absolute transistor size, process variations may impede these countermeasures. In this paper, we address the problem of charge injection effects (Section II) in highly-sensitive I&D front-ends. We introduce an I&D source follower (SF) front-end with improved, slow-slope reset scheme (Section III). Experimental comparison of the classical rectangular reset and improved reset is presented (Section IV). Both reset schemes achieve outstanding sensitivity at 100 Mbit/s and 250 Mbit/s. The improved reset scheme reduces charge injection effects up to the point of complete elimination, resulting in increased dynamic range, and enabling high sensitivity even without CDS (Section IV). #### **II. CHARGE INJECTION** We summarize the established theory [27], [28], [29], [30], [31], [32] of charge injection. A brief analysis of charge injection effects in I&D front-ends highlights the points of interest for circuit design. # A. MOS SWITCH MOS transistor switches inject excess charge into the connected nodes during turn-off [27], [28], [32]. Turn-off is a two-phase process, controlled by the gate voltage (clock) $v_G$ [29]. In the first phase, $v_G$ starts the transition from $v_H$ to $v_L$ with slope $U = |\mathrm{d}v_G/\mathrm{d}t|$ , see Fig. 1a. Channel charge $Q_{\mathrm{ch}}$ flows out of the source and drain. The first phase ends when the channel stops conducting ( $v_{GS} = V_{\mathrm{th}}$ ). The distribution of injected channel charge depends on process parameters ( $\mu$ , $C_{\mathrm{ox}}$ , W, L, $V_{\mathrm{th}}$ ), node impedances ( $R_{\mathrm{S}}$ , $C_{\mathrm{S}}$ , $C_{\mathrm{R}}$ ), DC operating point ( $v_{\mathrm{H}}$ , $v_{\mathrm{S}}$ ), and the slope of FIGURE 1. Distribution of channel charge $Q_{\rm ch}$ during turn-off of a MOS switch. (a) Generalized circuit model, after [31]. (b) Amount of injected charge in $C_{\rm R}$ versus circuit parameters and switching speed, after [29], [30], [31]. the gate voltage U. These dependencies are summarized in the normalized switching duration [29], $$B = (v_{\rm H} - v_{\rm S} - V_{\rm th}) \sqrt{\frac{\mu C_{\rm ox} W}{U C_{\rm R} L}}.$$ (1) Large B indicates slow switching, e.g. slow slope compared to transistor speed, whereas small B corresponds to fast switching. Fig. 1b shows the amount of channel charge deposited on $C_R$ after the first phase for some special cases [29], [30], [31]. Fast switching results in equal charge distribution between $C_S$ and $C_R$ . Slow switching permits charge redistribution through the conductive channel. In this regime, the final charge partitioning depends on the node impedances. Evidently, the least amount of channel charge injection to $C_R$ is obtained for low source impedance $(R_S = 0)$ or equivalently $C_S \to \infty$ ), and slow switching (large B). In the second phase, capacitive clock-feedthrough occurs due to the gate-drain and gate-source overlap capacitance $C_{\rm ol}$ [32]. Since the channel is no longer conducting ( $v_{\rm GS} < V_{\rm th}$ ), the injected charges remain at the source and drain nodes. The second phase ends when the gate voltage reaches its final value, $v_{\rm G} = v_{\rm L}$ . Half-sized dummy transistors are a common technique to compensate both, channel charge injection and clock-feedthrough, at either end of the switching transistor [27], [28], [32], [33]. The inverse clock signal, with respect to $\nu_G$ of the active switch, is applied to the dummy switches. The success of this method depends on equal charge distribution (Fig. 1b), transistor matching, and phase matching between the inverted and non-inverted clock [33]. These assumptions may be violated due to process variations. #### B. INTEGRATE-AND-DUMP FRONT-END Charge injection disturbs the small-signal behavior of the I&D front-end and post amplifiers. We are interested in the transient behavior within the integration phase of a single bit period, i.e. between two consecutive reset phases. To elaborate, consider the simple model of the I&D front-end in Fig. 2a. The input current i charges the effective input node capacitance (including Miller effect) $C_{\rm T}$ . A linear first-order voltage amplifier with DC gain $A_0$ and time constant $\tau_{\rm A}$ amplifies the input voltage. The front-end transfer function in FIGURE 2. Integrate-and-dump front-end transient response to charge injection. (a) Simplified circuit model. (b) Transient response of one bit (integration phase). the Laplace domain is $$H(s) = \frac{V_{\rm A}(s)}{I(s)} = \frac{1}{sC_{\rm T}} \cdot \frac{A_0}{1 + s\tau_{\rm A}}.$$ (2) Without charge injection, i equals the signal current $i_s$ of the PD. The latter is constant within each bit $i_s(t) = i_I$ , with $i_I \neq 0$ for the 1-bit, see Fig. 2b. The front-end output voltage due to $i_s(t)$ is the step-response of H(s), $$V_{A,I}(s) = \frac{i_{\rm I}}{s^2 C_{\rm T}} \cdot \frac{A_0}{1 + s\tau_{\rm A}},\tag{3}$$ $$v_{\rm A,I}(t) = \frac{i_{\rm I}A_0}{C_{\rm T}} \left( t - \frac{\tau_{\rm A}}{A_0^2} \left[ 1 - e^{-\frac{t}{\tau_{\rm A}}} \right] \right) \approx \frac{i_{\rm I}A_0}{C_{\rm T}} t.$$ (4) Considering a fast amplifier ( $\tau_A$ shorter than integration phase) with DC gain $A_0 \ge 1$ it follows that $\tau_A/A_0^2 \ll t$ . Thus, within one integration phase, $v_{A,I}(t)$ is approximately a linear function of time t, see Fig. 2b. This is the desired integrator transient response. Transistor $M_R$ resets the input node to a reference voltage (e.g. positive supply) to restore the operating point. Its gate voltage is typically a periodic rectangular clock, see Fig. 2a. When the clock transitions from low to high, $M_R$ turns off rapidly. Thereby, a charge $Q_R$ is injected into the input node at the beginning of the integration phase. Due to the short timescale the injected charge is modeled as an impulse current $$i_{\text{II}}(t) = Q_{\text{R}}\delta(t) \quad \circ - \bullet \quad I_{\text{II}}(s) = Q_{\text{R}},$$ (5) where $\delta(t)$ is the Dirac delta distribution, see Fig. 2b. The front-end output voltage due to $i_{II}(t)$ is the impulse response of H(s), $$V_{A,II}(s) = \frac{Q_{R}}{sC_{T}} \cdot \frac{A_{0}}{1 + s\tau_{A}},\tag{6}$$ $$v_{A,II}(t) = \frac{Q_{R}A_{0}}{C_{T}} \left( 1 - e^{-\frac{t}{\tau_{A}}} \right). \tag{7}$$ Since the model prescribes linearity, the total output voltage $v_{A,III}$ is the sum of the signal voltage $v_{A,I}$ and reset-induced voltage $v_{A,II}$ . As shown in Fig. 2b, charge injection causes a potentially large exponential transient that overlays the linear FIGURE 3. Proposed receiver architecture. signal. Equation (4) reveals that low effective capacitance $C_{\rm T}$ is advantageous to increase the signal amplitude. In fact, small $C_{\rm T}$ also reduces noise [34] and thus improves signal-to-noise ratio (SNR). On the other hand, Eq. (7) exposes the charge injection problem of the I&D front-end: Small $C_{\rm T}$ increases the error voltage induced by $Q_{\rm R}$ . The trade-off between SNR (sensitivity) and error voltage of the I&D front-end defines the size of $C_{\rm T}$ . Optimization towards sensitivity requires minimal $C_{\rm T}$ , i.e. a low-capacitance PD and small transistors. In this scenario the error voltage is maximal and easily saturates the front-end amplifier or subsequent post amplifiers, if it exceeds their dynamic range. Circuit design must effectively eliminate $Q_{\rm R}$ near the source to prevent amplification of the error voltage. #### III. PROPOSED RECEIVER We propose a monolithic PIN DD receiver with I&D frontend that is optimized with respect to sensitivity and data rate. The charge injection problem is addressed by a slow-slope reset scheme, motivated by the theory presented in Section II. Front-end noise and capacitance dominate the design of highly-sensitive integrating PIN DD receivers. This trade-off is apparent in the input-referred noise current power spectral density (PSD), $$S_i(f) = 4kT\Gamma \frac{\left(2\pi f \,\tilde{C}_{\rm T}\right)^2}{g_{\rm m}},\tag{8}$$ where $\Gamma$ is Ogawa's noise factor [35], $\tilde{C}_T$ is the total input node capacitance without Miller effect, $g_m$ is the frontend transconductance, k is Boltzmann's constant, and T is absolute temperature. Equation (8) is the limit of the shunt-feedback transimpedance amplifier (TIA) PSD [34] for infinite feedback resistance. Flicker noise is excluded from Eq. (8), because it is mostly canceled by CDS [36]. Disregarding front-end bandwidth, Eq. (8) shows that the foremost design goal for low noise is minimal input capacitance $\tilde{C}_T$ and high transconductance $g_m$ . The proposed receiver is shown in Fig. 3. Its parts are described in detail in the following sections. All circuits are isolated from the substrate (PD anode) potential $V_{\rm SUB}$ by a deep n-well. #### A. PHOTODIODE Since the PD capacitance $C_{\rm PD}$ is part of the input capacitance $\tilde{C}_{\rm T}$ , low $C_{\rm PD}$ is mandatory to achieve low noise according to Eq. (8). We integrate the proposed receiver core with an existing low-capacitance single-dot PIN PD, device 2 from [15]. The circular PD has a p+/p-well surface anode with a diameter of 30 $\mu$ m (707 $\mu$ m² active area). Its cathode is formed by a circular n-well with radius 1 $\mu$ m. Shallow trench isolation (STI) covers the whole active area. The intrinsic region is formed by a 24 $\mu$ m thick p- epitaxial layer, grown on top of the p+ bulk. At $-30 \, \text{V}$ bias, the simulated PD capacitance (without metal lines) is $C_{\rm PD} = 0.8 \, \text{fF}$ , and the measured 3 dB bandwidth is $f_{\rm PD} = 300 \, \text{MHz}$ at 675 nm [15]. No opto-window<sup>1</sup> was available in the fabrication run for this work (cf. Section IV). Therefore, the full isolation and passivation stack covers the PD, making its responsivity $\mathcal R$ more susceptible to process variations. The measured $\mathcal R$ without opto-window in [15] is around 0.355 A W<sup>-1</sup> at $\lambda = 635$ nm. #### B. FRONT-END The SF has not received much attention in the context of highly-sensitive receivers. SFs are commonly used in active pixel image sensors [37], where they regularly achieve excellent noise performance [38]. A prime example are quanta image sensors (QIS) that implement single-photon resolution without PD gain [39]. In contrast, previous attempts at highly-sensitive integrating PIN receivers applied a common-source (CS) amplifier stage [17] or inverter [16], [18], [19] front-ends. Integration is facilitated by the parasitic input-to-output capacitance, usually $C_{\rm GD}$ of the front-end transistor(s). Additional (parasitic) capacitance due to the physical layout degrades the sensitivity. Here we choose a SF front-end to solve several issues: - 1) The input transistor must be small for low noise, see Eq. (8). A small SF achieves higher bandwidth (data rate) than same size CS or inverter topologies. - 2) The SF reduces the effective input node capacitance C<sub>T</sub>, because there is no Miller effect. In fact, the positive near-unity gain of the SF even cancels most of the gate-source capacitance C<sub>GS</sub> [40]. We further exploit this cancellation in the layout, see Section III-D. Since the SF voltage gain is small, we implement a CS post amplifier to provide sufficient amplification. In a sense, gain and input capacitance requirements are decoupled by employing two stages. - The SF is an ideal choice for the proposed reset scheme, see Section III-C. Transistors $M_0$ and $M_1$ form the SF front-end, see Fig. 3. The gate capacitance of $M_0$ is matched to the PD for minimal noise [34], i.e. minimal $\tilde{C}_{\rm T}^2/g_{\rm m}$ in Eq. (8). $M_1$ is sized to maximize the SF voltage gain, which is around 0.97 according to post-layout simulation. $C_1$ is a 2 pF NMOS FIGURE 4. Comparison of reset schemes, by example of a PMOS reset switch. (a) Double reset in capacitive fingerprint sensor [41] (b) Slow/fast turn-off in TFT pixels [42] (c) Slow-slope reset in single-photon X-ray imaging [43] (d) Slow-slope reset in this work. capacitor that shunts the noise current of the mirror transistor $M_4$ to AVDD. Without $C_1$ , $M_1$ would amplify the noise of $M_4$ to the SF output node. Based on post-layout simulation of the total chip output root mean square (RMS) noise voltage, 77.6% of the total noise are due to $M_0$ , 17.8% due to $M_1$ , and 1.2% due to $M_4$ . The post-layout simulated SF bandwidth is 135 MHz. Due to its low transistor count, the CS stage ( $M_5$ , $M_6$ ) contributes negligible noise (< 5%). A key design choice was to increase the analog supply voltage AVDD from the typical 1.8 V to the process maximum of 2 V, to allow for an NMOS SF followed by an NMOS CS. This keeps $M_5$ small, while offering around 400 MHz bandwidth in the CS stage at a gain of 5. A low-bandwidth Miller operational transconductance amplifier (OTA) followed by an RC low-pass filter (RC LP, $f_{3\text{dB}} = 6.4\,\text{kHz}$ ) controls the CS output operating point. Since the SF is the limiting stage, receiver performance is unaffected by process variations of the CS gain and bandwidth. At the front-end input, $M_2$ and $M_3$ are the reset and dummy switch, respectively. Normally, the dummy transistor is half the size of the reset transistor [33]. For the lowest possible input capacitance and low charge injection, we make $M_2$ a minimum size transistor, which prohibits a half-sized dummy. To achieve dummy compensation, $M_3$ consists of two parallel minimum size transistors operated in the off state $(v_{\rm gs3} \approx 0)$ . Thereby, $M_3$ contributes four gate-drain overlap capacitances $C_{\rm ol}$ to the input node. One $C_{\rm ol}$ cancels the $C_{\rm ol}$ of $M_2$ , whereas the other three $C_{ol}$ of $M_3$ offer reasonable compensation of the channel charge of $M_2$ , according to simulation. Ideally, the reset pulse $\varphi/\overline{\varphi}$ should be infinitely short [20]. However, the on-resistance of $M_2$ and input node capacitance $C_{\rm T}$ require nonzero time to properly reset the input node [32]. Taking process variations into account, we found that 2 ns is sufficient for the proposed front-end in this 180 nm technology. CDS is applied to the chip output voltage $v_0$ to cancel the kT/C noise of $M_2$ before bit decision, see Section IV-A. #### C. SLOW-SLOPE RESET As discussed in Section II-A, turn-off speed of a switching transistor heavily influences charge injection effects. Slow turn-off is most effective if one side of the switch is <sup>&</sup>lt;sup>1</sup>Etch-back to improve optical transmission through the passivation and isolation stack. FIGURE 5. Reset pulse generator (a) circuit, and (b) schematically voltage waveforms. connected to a voltage source (cf. Fig. 1b). This topology arises naturally in the SF front-end, because its gate operating point can be set to the supply voltage. Our slow-slope turn-off circuit is motivated by existing solutions in other applications, such as the double reset scheme for capacitive fingerprint sensors [41], slow/fast turn-off in active matrix thin-film transistor (TFT) pixels [42], and slow-slope reset ( $S^2R$ ) in charge-sensing amplifiers for single-photon X-ray imaging [43], see Fig. 4. The concepts are adapted to address the constraints of a highly-sensitive I&D front-end. Firstly, contrary to Fig. 4a [41], we employ a single transistor for low input capacitance (high SNR). Secondly, our reset phase is much shorter than in Fig. 4a to 4c [41], [42], [43], due to the high data rate. To minimize the turn-off slope and keep the reset phase short, we start the turn-off right after turn on. The resulting gate voltages $\varphi$ and $\overline{\varphi}$ have a sawtooth shape (Fig. 4d). The reset pulse generator (RPG) provides the $\varphi/\overline{\varphi}$ gate signal for the reset and dummy switch, see Fig. 5a. An external low-voltage differential signaling (LVDS) clock (CLK) drives the differential input stage, that amplifies the clock to full-swing digital levels (signals $c, \overline{c}$ ). We chose this input topology because LVDS levels are common for clock distribution and easily scale to higher data rates. The CLK signal has 50 % duty cycle. The second section of the RPG transforms the on-chip clock $c/\bar{c}$ into two rectangular pulse signals of fixed width, $p/\bar{p}$ and $r/\bar{r}$ , see Fig. 5b. A series of standard logic buffers delays the clock. Logic combination of the delayed signals $d/\bar{d}$ with the undelayed clock $c/\bar{c}$ generates fixed-width pulses $p/\bar{p}$ . Because the p and $\bar{p}$ signals are generated by NOR and NAND gates, respectively, there is a systematic error between their pulse widths. Furthermore, the buffers have slightly different delays for rising and falling edges, which adds to the systematic error. To compensate the error, the $\bar{d}$ branch requires more delay buffers than the d branch, as shown in Fig. 5a. We conducted post-layout Monte-Carlo simulations to examine the variation of full width half maximum (FWHM) pulse width. The FWHM pulse width FIGURE 6. Post-layout simulated pulse shaper output voltage across process and temperature variations (-20 °C to 100 °C). average (standard deviation $\sigma$ ) is 2.08 ns ( $\sigma = 55$ ps) for p and 2.03 ns ( $\sigma = 46$ ps) for $\overline{p}$ . Signals $r/\overline{r}$ are short 300 ps pulses that are necessary for the third circuit block (see below). The generation of $r/\overline{r}$ is analogous to $d/\overline{d}$ , except that less delay buffers are required. Since matching of the two branches is not critical, the systematic error is not compensated. Post-layout Monte-Carlo FWHM pulse widths are 303 ps ( $\sigma=9$ ps) for r and 349 ps ( $\sigma=9$ ps) for $\overline{r}$ . The third circuit block in the RPG is a pulse shaping circuit that implements our improved slow-slope reset scheme. In essence, the pulse shaper is a complementary I&D circuit, see Fig. 5a. At the beginning of the 2 ns long reset phase, the short $r/\bar{r}$ pulses activate the reset switches $M_{13}$ and $M_{14}$ to discharge $C_{11}$ and $C_{12}$ (both 90 fF). Thereby, the $\varphi$ signal goes low ( $\bar{\varphi}$ goes high) and turns on the front-end reset transistor $M_2$ , see Fig. 3. Simultaneously, the $p/\bar{p}$ pulses activate the switched current sources $M_{11}$ and $M_{12}$ of the pulse shaper. The current sources charge $C_{11}$ and $C_{12}$ , resulting in slow ramp up of $\varphi$ (ramp down $\bar{\varphi}$ ). $M_{11}$ and $M_{12}$ are sized such that $\varphi$ ( $\bar{\varphi}$ ) reaches DVDD (GND) within our 2 ns reset phase. Thus, the turn-off slope is approximately U=0.9 V/ns. Based on the post-layout extracted $C_{\rm T}=2.25\,{\rm fF}$ , the slow-slope reset achieves a normalized switching time of B=8.5, cf. Fig. 1b. Accordingly, less than 7.5 % of the channel charge should be injected into the input node. In comparison, the rectangular $p/\bar{p}$ pulses have an average fall time/rise time of 140 ps, which equals $U=12.86\,{\rm V/ns}$ and B=2.6. **FIGURE 7.** Front-end layout floorplan (not to scale), showing the critical input node with shielding and transistors $M_0$ , $M_2$ , and $M_3$ . Since the pulse shaper topology does not permit transistor matching, $\varphi$ and $\overline{\varphi}$ are susceptible to process and temperature variations, see Fig. 6. Because the reset switches $M_{13}$ and $M_{14}$ are initially on, the variations barely influence the fall (rise) of $\varphi$ ( $\overline{\varphi}$ ). Despite variations of the current sources $M_{11}$ and $M_{12}$ , the worst-case turn-off slope is close to the average value, and slow compared to typical rise times. Therefore, process variations should only have a minor effect on the proposed slow-slope scheme (cf. Fig. 1b). #### D. LAYOUT Parasitic capacitance between the front-end input node and other nodes (AVDD, GND, $V_{\rm SUB}$ , $\varphi/\overline{\varphi}$ , SF output) occurs due to the physical layout. According to Eqs. (4) and (8) minimal input capacitance is critical for sensitivity. Thereby, all parasitic degrade sensitivity, but some cause additional penalties: Capacitance to the supply (AVDD, GND, $V_{\rm SUB}$ ) may couple supply noise to the input node; capacitance to the gate clocks $\varphi/\overline{\varphi}$ increases charge injection; capacitance to the SF output, however, acts like additional SF $C_{\rm GS}$ , that is almost fully cancelled by the gain of the SF [40]. For minimal capacitance, the metal connection to the PD cathode is made on the topmost thin metal layer (metal 4). A via stack near the front-end transistors connects the PD metal to the front-end. We applied a shielding layer beneath the sensitive input node, in the vicinity of the via stack, see Fig. 7. The purpose of this shield is to minimize parasitic capacitance to unwanted nodes, by introducing additional capacitance between the input node and the shield. Because the shield is connected to the SF output, the shield-to-input capacitance is small. #### E. POST AMPLIFICATION The post amplifier (PA) and output driver (OD) (Fig. 3) are designed to have minimal influence on the receiver bandwidth, that is set by the front-end SF. We acknowledge that this choice of PA and OD is not optimal in terms of power consumption, since the OD alone accounts for more than 50% of the total power (see Section IV). A Sallen-Key second-order Butterworth low-pass filter (LPF, $f_{3dB} = 2.5 \, \text{kHz}$ ) converts the single-ended front-end output signal to pseudo-differential. The same Miller OTA as for the CS control is used for the Sallen-Key low-pass. The pseudo-differential signal is amplified by the PA. FIGURE 8. Chip micrograph with circuit blocks annotated: (a) Full chip, (b) front-end (FE) section. The PA is a two-stage wideband differential amplifier with resistive feedback. The PA input stage is an NMOS differential amplifier, the output stage are two PMOS SFs. Post-layout simulated voltage gain and bandwidth are 2.9 and 519 MHz, respectively. The PA common-mode feedback (CMFB) is based on the same Sallen-Key low-pass filter as before. Two NMOS SFs are used as the OD, which is designed for a $100 \Omega$ differential load. The post-layout simulated OD bandwidth is 392 MHz. Process variations of the PA, OD, and OTA are minimized by best practice common-centroid layout matching. #### IV. EXPERIMENTAL RESULTS The receivers were fabricated in 180 nm high-voltage CMOS with five metal layers, see Fig. 8. The analog supply voltage is 2 V, the digital supply voltage (RPG circuit) is 1.8 V, and the substrate bias $^2$ is -30 V. The die size is $1.02 \text{ mm} \times 1.02 \text{ mm}$ . Total power consumption is 164 mW, where the majority is dissipated by the PA (61 mW) and OD (87 mW). The simulated front-end power consumption is 0.12 mW by the SF and 5.8 mW by the CS stage. We compare the improved receiver with slow-slope reset pulse, labeled "Improved", to a reference receiver with rectangular reset pulse, labeled "Rectangular". The reference receiver has identical circuits and layout, except that the RPG pulse shaper is removed, cf. Fig. 5. In the Improved receiver, $\varphi$ and $\overline{\varphi}$ connect to the gate of $M_2$ and drain of $M_3$ , respectively; whereas in the Rectangular receiver, $\overline{p}$ and p connect to $M_2$ and $M_3$ . #### A. METHODS Two samples of the *Improved* receiver and two samples of the *Rectangular* receiver were characterized. The chips are glued to test PCBs and wire bonded (chip on board). The temperature of the samples is not strictly controlled; ambient temperature is around $20\,^{\circ}\text{C}$ . Fig. 9 shows the BER test setup. The sinusoidal clock source<sup>3</sup> feeds a custom emitter-coupled logic (ECL) pulsewidth modulation (PWM) generator<sup>4</sup> and the pseudorandom <sup>&</sup>lt;sup>2</sup>Keysight B2987A. <sup>&</sup>lt;sup>3</sup>Agilent 81150A for 100 Mbit/s, Agilent E4424B for 250 Mbit/s. <sup>&</sup>lt;sup>4</sup>Analog Devices MAX40026 & Microchip SY89295U. FIGURE 9. Test setup for BER characterization. FIGURE 10. Illustration of CDS and bit decision. Measured 100 Mbit/s 80 % RZ PRBS modulation signal (top), calculated CDS voltage (center), and measured transient chip output voltage (bottom) of sample Improved(1) at -53.5 dBm optical power. The first CDS sample point within a bit is marked by triangles ( $\blacktriangle$ ), the second sample point by dots ( $\bullet$ ). bit sequence (PRBS) pattern generator<sup>5</sup> (PRBS15 test pattern). The PRBS generator outputs an non-return-to-zero (NRZ) on-off keying (OOK) data signal. Gating<sup>6</sup> the NRZ PRBS with the PWM signal yields the return-to-zero (RZ) OOK modulated PRBS. Since the reset duration of our chips is fixed at 2 ns, 50 % RZ and 80 % RZ modulation were used for 250 Mbit/s and 100 Mbit/s, respectively. The chip clock CLK is delayed<sup>7</sup> by a fixed time, to account for the constant delay of all cables and the optical fiber. The amplified<sup>8</sup> RZ PRBS ( $v_{pp} = 1.35 \,\mathrm{V}$ ) directly modulates a 642 nm laser.<sup>9</sup> After attenuation and splitting, <sup>10</sup> the optical signal enters the on-chip PD via a stripped single-mode fiber. <sup>11</sup> A power meter <sup>12</sup> in the reference branch allows us to calculate the optical power incident on the chip, using a predetermined calibration factor. A metal box encloses the chip to block all ambient light. FIGURE 11. Measured differential chip output voltage at the respective CDS sensitivity. (a) 100 Mbit/s, 80 % RZ modulation. (b) 250 Mbit/s, 50 % RZ modulation. An oscilloscope<sup>13</sup> records 1 Mbit long waveforms of the analog chip output and RZ PRBS signal at a sample rate of 20 GS/s and bandwidth of 3 GHz (1 GHz) for 250 Mbit/s (100 Mbit/s). Each waveform corresponds to one optical power setting. CDS, bit decision, and BER calculation are implemented in a Python script. Fig. 10 illustrates the key parameters. First, the transient RZ PRBS (top) and chip output (bottom) waveform are read from the input file. The script sweeps the CDS delta time $\Delta t$ and decision threshold voltage $V_{\rm DTH}$ to find the optimum BER for each waveform (optical power) independently. For each $\Delta t$ we read the first ( $\Delta$ ) and second ( $\bullet$ ) sample of each bit and compute the difference $\Delta v$ . Comparison of $\Delta v$ to $V_{\rm DTH}$ decides the bit value, see Fig. 10 (center). The absolute delay between PRBS and chip output $v_{\rm o}$ is compensated by correlation of the two waveforms. Note that the BER resolution is limited to $10^{-6}$ due to the 1 Mbit length of the recorded waveforms. In other words, counting 2000 bit errors corresponds to our target BER = $2 \cdot 10^{-3}$ . Single sampling is implemented in Python analogous to CDS. However, only one sample per bit is taken from $v_0(t)$ and compared to the decision threshold $V_{\rm DTH}$ . # B. TRANSIENT WAVEFORMS In Section II-B we argued that I&D front-ends experience large exponential transients due to charge injection. We will now compare this theory to the measured analog chip output voltage, see Fig. 11. <sup>&</sup>lt;sup>5</sup>Sympuls BMG2500. <sup>&</sup>lt;sup>6</sup>onsemi MC100LVEP05. <sup>&</sup>lt;sup>7</sup>Microchip SY89295U. <sup>&</sup>lt;sup>8</sup>Analog Devices ADL5569. <sup>&</sup>lt;sup>9</sup>Thorlabs CLD1010LP & Thorlabs LP642-SF20, Bias current 55 mA. $<sup>^{10}\</sup>mbox{Thorlabs}$ V600F & Thorlabs TW630R2F2. <sup>&</sup>lt;sup>11</sup>Thorlabs SM600. <sup>&</sup>lt;sup>12</sup>Thorlabs PM100USB & Thorlabs S150C; Calibration factor 0.1014. <sup>&</sup>lt;sup>13</sup>Keysight MSOV204A. **TABLE 1.** Measured responsivity and sensitivity at $\lambda =$ 642 nm and BER = 0.002. | | | | $100\mathrm{Mbit/s}$ | | 250 N | Ibit/s | |----------------|---------------------|--------|-----------------------------|----------------------------------|-----------------------------|-----------------------------------| | | $\mathcal{R}/(A/W)$ | $\eta$ | $\overline{P}/\mathrm{dBm}$ | $\Delta P^{\dagger}/\mathrm{dB}$ | $\overline{P}/\mathrm{dBm}$ | $\Delta P^{\ddagger}/\mathrm{dB}$ | | Improved(1) | 0.409 | 0.789 | -53.52 | 17.16 | -47.01 | 19.69 | | Improved(2) | 0.309 | 0.597 | -51.48 | 19.20 | -45.77 | 20.94 | | Rectangular(1) | 0.240 | 0.464 | -49.41 | 21.27 | -45.64 | 21.07 | | Rectangular(2) | 0.362 | 0.700 | -52.06 | 18.62 | -46.28 | 20.42 | $<sup>^{\</sup>dagger}$ QL at 100 Mbit/s is -70.68 dBm. At 100 Mbit/s (Fig. 11a), both *Rectangular* samples exhibit settling after the reset. Since the SF defines the overall receiver bandwidth (cf. Section III-B), the settling transient is qualitatively similar to the first-order model of Fig. 2b. Rectangular(1) shows substantially more settling than Rectangular(2), indicating large net charge injection. We attribute this difference to the process-dependent mismatch of reset transistor and dummy transistor, which results in sample-to-sample variation. In contrast, our *Improved* chips show the desired integrator ramp function for the 1-bit and constant output for the 0-bit, without noticeable settling transients. Fig. 10 shows additional $v_0(t)$ measurements of Improved(1). At 250 Mbit/s (Fig. 11b) we observe a similar difference between Improved and Rectangular samples. The *Improved* samples again show the desired integrator behavior, although there is a small variation between Improved(1) and Improved(2), contrary to 100 Mbit/s. The Rectangular samples produce large settling transients again, but the waveform shapes at 250 Mbit/s and 100 Mbit/s are unalike. The reason for this stark contrast is that the fast amplifier assumption $(\tau_A/A_0^2 \ll t)$ of our model in Section II-B does not hold at 250 Mbit/s. The SF gain and bandwidth incur $\tau_A/A_0^2 = 1.25 \,\text{ns}$ (cf. Section III-B), whereas the integration phase at 250 Mbit/s with 50 % RZ modulation is only 2 ns. Therefore, only the steep beginning of the exponential transient (Fig. 2b) disturbs the integration phase at 250 Mbit/s, resulting in the output waveform shown in Fig. 11b. Pre- and post-layout simulation qualitatively predicted this behavior. Based on the foregoing comparison, we conclude that the slow-slope reset scheme significantly reduces charge injection. Considering the nearly ideal integrator (ramp) waveforms of both *Improved* receivers, that are unmatched even by the well-performing *Rectangular(2)* sample, we believe that the combination of dummy switch and slow-slope gate voltage can fully compensate charge injection effects in I&D front-ends. #### C. SENSITIVITY Since the chips were fabricated without an opto-window, process variations in the oxide stack strongly influence the responsivity. We measured the DC substrate current at various optical powers to calculate $\mathcal{R}$ and $\eta$ of each sample at $\lambda=642\,\mathrm{nm}$ , see Table 1. On average, $\mathcal{R}$ is close to the value reported in [15]. However, the variation between samples is substantial. For this reason, experimental BER FIGURE 12. Measured receiver BER with CDS. (a) BER at 100 Mbit/s, 80 % RZ modulation. (b) BER at 250 Mbit/s, 50 % RZ modulation. (b) -48 -44 $\eta \overline{P}/dBm$ -40 -56 -52 characteristics of the improved and classical reset scheme are compared in terms of detected power $\eta \overline{P}$ . The measured absolute sensitivities $\overline{P}$ , and the distance $\Delta P$ to the QL, at BER = 0.002 are given in Table 1. Fig. 12 presents BER characteristics with CDS equalization at 100 Mbit/s (Fig. 12a) and 250 Mbit/s (Fig. 12b). There are two key results. First, the *Improved* slow-slope reset does not necessarily improve sensitivity, compared to the Rectangular reset. At 100 Mbit/s the Improved variant has a slight ( $\approx 1 \, \text{dB}$ ) advantage in $\eta \overline{P}$ sensitivity, whereas at 250 Mbit/s both reset schemes perform equally well. We attribute this result to CDS, receiver linearity, and data-independence of charge injection. The receiver should behave linearly as long as settling transients do not saturate any amplifier. If linearity is true, the transient settling adds to the data signal without interaction, as modeled in Section II-B. When CDS is applied to the combined signal, the settling causes an offset voltage. Furthermore, if the transient settling is independent of the bit value, the offset after CDS is the same for each bit and is easily compensated by a shift of the decision threshold $V_{\rm DTH}$ . With these preconditions, the presence or absence of charge injection does not influence sensitivity. The second conclusion from Fig. 12 is that dynamic range suffers from charge injection. *Rectangular(1)* has the lowest dynamic range (12.7 dB at 250 Mbit/s), while it also shows the highest amount of transient settling (cf. Fig. 11). On the other hand, *Rectangular(2)* and both *Improved* samples have little or no charge injection and therefore show comparable dynamic range (18.3 dB to 18.8 dB at 250 Mbit/s). Again, we can explain this result using linearity. The amplifiers will $<sup>^{\</sup>ddagger}$ QL at 250 Mbit/s is -66.70 dBm. FIGURE 13. Comparison of measured BER for CDS and single sampling at 250 Mbit/s, 50 % RZ modulation. saturate at a given amount of signal amplitude. If most of the amplitude is contributed by transient settling due to charge injection, the data signal must be small. Small data signal implies low optical power. Both, the transient and normalized sensitivity experimental results, show only slight differences between Improved(1) and Improved(2). These small variations suggest that process variations of the SF front-end are negligible with regard to sensitivity. As shown in Table 1, the remaining difference in absolute sensitivity is mostly due to variations in $\mathcal{R}$ . #### D. SINGLE SAMPLING The flawless transient behavior of our *Improved* samples questions the necessity of CDS equalization (cf. Fig. 11). CDS cancels kT/C and 1/f noise, but amplifies other parts of the noise PSD by taking two samples, instead of one. To investigate, we re-evaluated data of all receivers using single sampling and bit decision. Fig. 13 shows the BER comparison between single sampling and CDS at 250 Mbit/s. The single sampling sensitivity of the Improved(1)/(2) receiver is only 1.25 dB/0.63 dB worse than its CDS sensitivity. The Rectangular(2) sample shows similar behavior, with 2.78 dB improvement by CDS over single sampling. This is in line with previous results, because Rectangular(2) shows little transient settling (cf. Fig. 11). In contrast, the single sampling BER of Rectangular(1) is never lower than BER = 0.16. Hence, Rectangular(1) can only be used with CDS, because single sampling cannot tolerate the large transient settling. To summarize, the single sampling sensitivity correlates with the amount of injected charge that is inferred from the transient waveforms. The *Improved* slow-slope receivers offer practical BER with single sampling and CDS. On the other hand, the variability of the *Rectangular* receivers prohibits reliable single sampling performance. #### V. COMPARISON Table 2 summarizes the main characteristics of our improved receiver and compares it to state-of-the art PIN PD and SPAD receivers. APDs are omitted, because the better SPAD performance is the definitive benchmark for our work. Our SF front-end improves the data rate compared to other, equally sensitive PIN receivers. Reference [18] achieved the lowest data rate of 20 Mbit/s, and a distance of $\Delta P = 21.2 \,\mathrm{dB}$ to the QL. However, [18] utilized a larger $62 \,\mu\text{m} \times 62 \,\mu\text{m}$ PD, compared to the 30 µm diameter PIN PD in this work. Although circuit design addressed this problem, the large PD causes some power penalty, that explains the larger distance to QL of [18]. The I&D receiver in [17] has demonstrated the best known PIN sensitivity so far, $\Delta P = 17.3 \, dB$ at 50 Mbit/s. Our slow-slope receiver matches the distance to QL of [17] at twice the data rate (100 Mbit/s). Compared to the ICC receiver in [19], the I&D receiver in this work improves data rate by a factor of 5 (250 Mbit/s), also at similar $\Delta P$ . Finally, at an equal data rate of 100 Mbit/s, this work achieves 1 dB better sensitivity than the I&D receiver in [16]. Considering that [16] used a larger structuresize (350 nm) process, this improvement seems insignificant. Despite its good sensitivity, our receiver likely suffers from increased noise at 100 Mbit/s, because its bandwidth supports 250 Mbit/s while [16] is limited to 100 Mbit/s. Moreover, this 180 nm process exhibits considerable 1/f noise near 100 MHz, that cannot be cancelled by CDS at 100 Mbit/s. As a result, the 100 Mbit/s CDS sensitivity of this work is similar to [16]. The resistive TIA in [44] achieves a higher data rate (622 Mbit/s), but the sensitivity of the resisitive TIA is about 6 dB (4.5 dB) further away from the QL than our best (worst) result at 250 Mbit/s. Furthermore, [44] used InGaAs-InP technology. With regards to SPADs, our improved PIN receiver outperforms some SPAD receivers in terms of sensitivity or data rate, but not all of them. Firstly, we achieved equal distance to QL at twice the data rate, compared to [8] and [9]. However, at even lower data rate (20 Mbit/s), [9] demonstrated $\Delta P = 13.6 \,\mathrm{dB}$ , that is unmatched by our receivers. The $64 \times 64$ SPAD array receiver [10] (additional data given in [11]) achieved unparalleled sensitivity and data rate. Our PIN receiver requires only a single PD, reducing circuit complexity, but does not perform as well as [10] and [11]. Two experiments with discrete SPAD arrays (SiPMs) [12], [13] reported excellent sensitivity, on par with [10], [11]. We specifically chose the UV wavelength receiver [13] for this comparison to emphasize the high effort required to achieve high sensitivity with SPADs. While the sensitivity is 11.9 dB above the QL, the data rate in [13] is just 1 Mbit/s. Furthermore, the SPAD array had to be cooled to $-10\,^{\circ}$ C to enable a low enough BER, that makes the system viable in the first place. In contrast, our PIN receiver operates at room temperature, and requires only a single PD. Moreover, PIN PDs are less sensitive to temperature changes, whereas SPADs need temperature control (e.g. Peltier coolers) to make the photon detection probability (PDP) independent of temperature. Lastly, there are some architectural limitations of our receiver. First, the relatively small PD complicates fiber alignment. In contrast, SPAD arrays naturally have a large TABLE 2. Comparison to State-of-the-Art. | Ref. | Technology | Power cons. | PD Type | Bit Rate | BER | Wavelength | Sensitivity | Dist. to QL | |------------------|-----------------------|--------------------|-------------------------------------------------------------|-----------------------|-------------------|-----------------------|-----------------------------|------------------------| | | | $P_{ m S}/{ m mW}$ | (Reverse Bias) | $1/(\mathrm{Mbit/s})$ | | $\lambda/\mathrm{nm}$ | $\overline{P}/\mathrm{dBm}$ | $\Delta P/\mathrm{dB}$ | | [8] <sup>b</sup> | $350\mathrm{nm}$ CMOS | 19 | 4×SPAD (30 V) | 50 | $2 \cdot 10^{-3}$ | 635 | -55.7 | 17.9 | | $[9]^{b}$ | $350\mathrm{nm}$ CMOS | 281 | SPAD (32 V) | 50 | $2 \cdot 10^{-3}$ | 635 | -57.0 | 16.6 | | $[9]^{b}$ | $350\mathrm{nm}$ CMOS | 119 | SPAD (32 V) | 20 | $2 \cdot 10^{-3}$ | 635 | -64.0 | 13.6 | | [10] | $130\mathrm{nm}$ CMOS | 115 | $64 \times 64 \text{ SPAD } (15.2 \text{ V})$ | 400 | $2 \cdot 10^{-3}$ | 450 | -49.9 | 13.2 | | [10], [11] | $130\mathrm{nm}$ CMOS | 115 | $64 \times 64 \text{ SPAD } (15.2 \text{ V})$ | 50 | $2 \cdot 10^{-3}$ | 450 | -60.5 | 11.7 | | [13] | Not integrated | _ | SiPM with $120 \times 120 \text{ SPADs}^a (55.2 \text{ V})$ | 1 | $2.4\cdot10^{-3}$ | 275 | -75.2 | 11.9 | | [12] | Not integrated | _ | SiPM with 5676 SPADs (28 V) | 400 | $10^{-3}$ | 405 | -50.8 | 11.3 | | [44] | InGaAs-InP | _ | PIN (5 V) | 622 | $10^{-9}$ | 1300 | -34.7 | 25.5 | | $[18]^{b}$ | $180\mathrm{nm}$ CMOS | 35 | PIN (8 V) | 20 | $2 \cdot 10^{-3}$ | 642 | -56.5 | 21.2 | | $[19]^{b}$ | $350\mathrm{nm}$ CMOS | $12^c$ | PIN (10 V) | 50 | $10^{-6}$ | 642 | -50.8 | 19.2 | | $[16]^{b}$ | $350\mathrm{nm}$ CMOS | $3.9^c$ | PIN (20 V) | 100 | $2 \cdot 10^{-3}$ | 642 | -52.3 | 18.4 | | $[17]^{b}$ | $180\mathrm{nm}$ CMOS | 24 | PIN (20 V) | 50 | $2 \cdot 10^{-3}$ | 635 | -56.4 | 17.3 | | This work | $180\mathrm{nm}$ CMOS | 164 | PIN (32 V) | 100 | $2 \cdot 10^{-3}$ | 642 | -53.5 | 17.2 | | | | | | 250 | $2 \cdot 10^{-3}$ | 642 | -47.0 | 19.7 | $<sup>^</sup>a$ Cooled to −10 $^{\circ}$ C. light-sensitive area. Second, the slow-slope reset scheme requires a certain duration of the reset phase. This may limit the data rate of future receivers. Finally, computing CDS in post processing is incompatible with real-time applications. #### VI. CONCLUSION A monolithic, highly-sensitive, PIN photodiode direct detection receiver in 180 nm CMOS was presented. Charge injection in the integrate-and-dump architecture is greatly reduced by application of a slow-slope reset scheme. The SF front-end increased the data rate to 250 Mbit/s, over the 100 Mbit/s limit of previous front-ends exploiting parasitic integration capacitances. Experimental data showed a bestcase CDS sensitivity (distance to QL) of -53.52 dBm $(17.16 \,\mathrm{dB})$ at 100 Mbit/s, and $-47.01 \,\mathrm{dBm}$ $(19.69 \,\mathrm{dB})$ at 250 Mbit/s. Although the slow-slope reset did not improve CDS sensitivity compared to the rectangular reset in this 180 nm process, OEICs realized in smaller process nodes may benefit from the slow-slope scheme because decreasing input capacitance worsens the charge injection problem. Moreover, low charge injection of the slow-slope reset supports higher dynamic range and results in a notable single sampling sensitivity (distance to QL) of $-45.76 \,\mathrm{dBm}$ (20.94 dB) at 250 Mbit/s. The disadvantage of our receiver is high power consumption, mostly due to inefficient output driver design. The existing driver consumes 87 mW. An estimation based on previous OEICs suggests that an improved driver should consume around 30 mW. ## **ACKNOWLEDGMENT** The authors would like to thank A. Zimmer from X-FAB, Erfurt, Germany, for chip fabrication and technical support. #### **REFERENCES** - [1] T. Muoi, "Receiver design for high-speed optical-fiber systems," *J. Lightw. Technol.*, vol. 2, no. 3, pp. 243–267, Jun. 15, 1984. - [2] J. R. Barry and E. A. Lee, "Performance of coherent optical receivers," Proc. IEEE, vol. 78, no. 8, pp. 1369–1394, 1990. - [3] L. D. Tzeng, O. Mizuhzra, T. V. Nguyen, K. Ogawa, I. Watanabe, K. Makita, M. Tsuji, and K. Taguchi, "A high-sensitivity APD receiver for 10-Gb/s system applications," *IEEE Photon. Technol. Lett.*, vol. 8, no. 9, pp. 1229–1231, Sep. 15, 1996. - [4] D. O'Brien, R. Turnbull, H. Le Minh, G. Faulkner, O. Bouchet, P. Porcon, M. El Tabach, E. Gueutier, M. Wolf, L. Grobe, and J. Li, "High-speed optical wireless demonstrators: Conclusions and future directions," *J. Lightw. Technol.*, vol. 30, no. 13, pp. 2181–2187, Jul. 15, 2012. - [5] T. Jukić, B. Steindl, R. Enne, and H. Zimmermann, "200 μm APD OEIC in 0.35 μm BiCMOS," *Electron. Lett.*, vol. 52, no. 2, pp. 128–130, Jan. 2016. - [6] D. Milovančev, P. Brandl, T. Jukić, B. Steindl, N. Vokić, and H. Zimmermann, "Optical wireless APD receivers in 0.35 μm HV CMOS technology with large detection area," *Opt. Exp.*, vol. 27, no. 9, pp. 11930–11945, Apr. 2019. - [7] E. Fisher, I. Underwood, and R. Henderson, "A reconfigurable single-photon-counting integrating receiver for optical communications," *IEEE J. Solid-State Circuits*, vol. 48, no. 7, pp. 1638–1650, Jul. 2013. - [8] H. Zimmermann, B. Steindl, M. Hofbauer, and R. Enne, "Integrated fiber optical receiver reducing the gap to the quantum limit," *Sci. Rep.*, vol. 7, no. 1, p. 2652, Jun. 2017. - [9] B. Goll, M. Hofbauer, B. Steindl, and H. Zimmermann, "A fully integrated SPAD-based CMOS data-receiver with a sensitivity of -64 dBm at 20 Mb/s," *IEEE Solid-State Circuits Lett.*, vol. 1, no. 1, pp. 2–5, Jan. 2018. - [10] J. Kosman, O. Almer, T. A. Abbas, N. Dutton, R. Walker, S. Videv, K. Moore, H. Haas, and R. Henderson, "29.7 A 500Mb/s -46.1dBm CMOS SPAD receiver for laser diode visible-light communications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 468–470. - [11] A. D. Griffiths, J. Herrnsdorf, O. Almer, R. K. Henderson, M. J. Strain, and M. D. Dawson, "High-sensitivity free space optical communications using low size, weight and power hardware," 2019, arXiv:1902.00495. - [12] Z. Ahmed, R. Singh, W. Ali, G. Faulkner, D. O'Brien, and S. Collins, "A SiPM-based VLC receiver for gigabit communication using OOK modulation," *IEEE Photon. Technol. Lett.*, vol. 32, no. 6, pp. 317–320, Mar. 15, 2020. - [13] F. Liu, J. Farmer, A. Schreier, G. Faulkner, H. Chun, W. Matthews, Z. Wang, and D. O'Brien, "Ultra-sensitive UV solar-blind optical wireless communications with an SiPM," *Opt. Lett.*, vol. 48, no. 20, pp. 5387–5390, Oct. 2023. - [14] B. Goll, K. Schneider-Hornstein, and H. Zimmermann, "Ultra-low capacitance spot PIN photodiodes," *IEEE Photon. J.*, vol. 15, no. 2, pp. 1–6, Apr. 2023. - [15] B. Goll, K. Schneider-Hornstein, and H. Zimmermann, "Dot PIN photodiodes with a capacitance down to 1.14 aF/μm²," *IEEE Photon. Technol. Lett.*, vol. 35, no. 6, pp. 301–304, Mar. 15, 2023. - [16] C. Gasser, S. M. Laube, K. Schneider-Hornstein, and H. Zimmermann, "Ultra sensitive PIN-diode receiver utilizing photocurrent integration on a parasitic capacitance," *IEEE Access*, vol. 12, pp. 118371–118376, 2024. <sup>&</sup>lt;sup>b</sup>Work by our research group. c Without output driver. - [17] K. Schneider-Hornstein, B. Goll, and H. Zimmermann, "Ultra-sensitive PIN-photodiode receiver," *IEEE Photon. J.*, vol. 15, no. 3, pp. 1–9, Jun. 2023. - [18] S. M. Laube, C. Gasser, K. Schneider-Hornstein, and H. Zimmermann, "Highly-sensitive integrating optical receiver with large PIN photodiode," *IEEE Photon. J.*, vol. 16, no. 6, pp. 1–9, Dec. 2024. - [19] C. Gasser, C. Ribisch, S. M. Laube, K. Schneider-Hornstein, and H. Zimmermann, "Ultrasensitive reset-less integrator-based PIN-diode receiver with input current control," *IEEE Solid-State Circuits Lett.*, vol. 8, pp. 17–20, 2025. - [20] R. P. Jindal, "Silicon MOS amplifier operation in the integrate and dump mode for gigahertz band lightwave communication systems," *J. Lightw. Technol.*, vol. 8, no. 7, pp. 1023–1026, Jul. 15, 1990. - [21] A. E. Stevens, "An integrate-and-dump receiver for fiber optic networks," Ph.D. thesis, Columbia Univ., New York, NY, USA, 1995. Accessed: Jun. 4, 2023. - [22] A. Emami-Neyestanak, D. Liu, G. Keeler, N. Helman, and M. Horowitz, "A 1.6 Gb/s, 3 mW CMOS receiver for optical communication," in Symp. VLSI Circuits. Dig. Tech. Papers, 2002, pp. 84–87. - [23] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, "A 90 nm CMOS 16 Gb/s transceiver for optical interconnects," *IEEE J. Solid-State Circuits*, vol. 43, no. 5, pp. 1235–1246, May 2008. - [24] G. Ferrari, F. Gozzini, A. Molari, and M. Sampietro, "Transimpedance amplifier for high sensitivity current measurements on nanodevices," *IEEE J. Solid-State Circuits*, vol. 44, no. 5, pp. 1609–1616, May 2009. - [25] M. Georgas, J. Orcutt, R. J. Ram, and V. Stojanovic, "A monolithically-integrated optical receiver in standard 45-nm SOI," *IEEE J. Solid-State Circuits*, vol. 47, no. 7, pp. 1693–1702, Jul. 2012. - [26] M. H. White, D. R. Lampe, F. C. Blaha, and I. A. Mack, "Characterization of surface channel CCD image arrays at low light levels," *IEEE J. Solid-State Circuits*, vol. SSC-9, no. 1, pp. 1–12, Feb. 1974. - [27] K. Stafford, R. Blanchard, and P. Gray, "A completely monolithic sample/hold amplifier using compatible bipolar and silicon-gate FET devices," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 1974, pp. 190–191. - [28] R. E. Suarez, P. R. Gray, and D. A. Hodges, "All-MOS chargeredistribution analog-to-digital conversion techniques. II," *IEEE J. Solid-State Circuits*, vol. SSC-10, no. 6, pp. 379–385, Dec. 1975. - [29] E. Vittoz, "Microwatt switched capacitor circuit design," *Act. Passive Electron. Compon.*, vol. 9, no. 4, pp. 263–273, Jan. 1982. - [30] B. J. Sheu and C. Hu, "Switch-induced error voltage on a switched capacitor," *IEEE J. Solid-State Circuits*, vol. SSC-19, no. 4, pp. 519–525, Aug. 1984. - [31] J.-H. Shieh, M. Patil, and B. J. Sheu, "Measurement and analysis of charge injection in MOS analog switches," *IEEE J. Solid-State Circuits*, vol. SSC-22, no. 2, pp. 277–281, Apr. 1987. - [32] P. E. Allen and D. R. Holberg, CMOS Analog Circuit Design, 3rd ed., London, U.K.: Oxford Univ. Press, 2012. - [33] C. Eichenberger and W. Guggenbuhl, "Dummy transistor compensation of analog MOS switches," *IEEE J. Solid-State Circuits*, vol. 24, no. 4, pp. 1143–1146, Aug. 1989. - [34] E. Sackinger, "On the noise optimum of FET broadband transimpedance amplifiers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 12, pp. 2881–2889, Dec. 2012. - [35] K. Ogawa, "Noise caused by GaAs mesfets in optical receivers," *Bell Syst. Tech. J.*, vol. 60, no. 6, pp. 923–928, Jul. 1981. - [36] R. J. Kansy, "Response of a correlated double sampling circuit to 1/f noise [generated in CCD arrays]," *IEEE J. Solid-State Circuits*, vol. SSC-15, no. 3, pp. 373–375, Jun. 1980. - [37] O. Yadid-Pecht, R. Ginosar, and Y. Shacham-Diamand, "A random access photodiode array for intelligent image capture," *IEEE Trans. Electron Devices*, vol. 38, no. 8, pp. 1772–1780, Aug. 1991. - [38] M.-W. Seo, S. Kawahito, K. Kagawa, and K. Yasutomi, "A 0.27e-rms read noise 220-μV/e-conversion gain reset-gate-less CMOS image sensor with 0.11-μm CIS process," *IEEE Electron Device Lett.*, vol. 36, no. 12, pp. 1344–1347, Dec. 2015. - [39] E. Fossum, J. Ma, S. Masoodian, L. Anzagira, and R. Zizza, "The quanta image sensor: Every photon counts," *Sensors*, vol. 16, no. 8, p. 1260, Aug. 2016. - [40] B. Razavi, Design of Analog CMOS Integrated Circuits. New York, NY, USA: McGraw-Hill, 2001. - [41] M. Tartagni and R. Guerrieri, "A fingerprint sensor based on the feedback capacitive sensing scheme," *IEEE J. Solid-State Circuits*, vol. 33, no. 1, pp. 133–142, Jan. 1998. - [42] S. Ashtiani and A. Nathan, "Reducing charge injection in active-matrix a-SI TFT pixels," in *Proc. IEEE LEOS Annu. Meeting Conf.*, Oct. 2006, pp. 440–441. - [43] H.-S. Kim and K.-Y. Han, "Low-noise reset technique of an asynchronous charge-pulse-detecting pixel for single-photon X-ray imaging," *J. Korean Phys. Soc.*, vol. 68, no. 3, pp. 456–461, Feb. 2016. - [44] N. Uchida, Y. Akahori, M. Ikeda, A. Kohzen, J. Yoshida, T. Kokubun, and K. Suto, "A 622 Mb/s high-sensitivity monolithic InGaAs-InP pin-FET receiver OEIC employing a cascode preamplifier," *IEEE Photon. Technol. Lett.*, vol. 3, no. 6, pp. 540–542, Jun. 15, 1991. **SIMON MICHAEL LAUBE** received the B.Sc. degree in electrical engineering and information technology and the Dipl.-Ing. degree in embedded systems from TU Wien, Vienna, Austria, in 2020 and 2021, respectively, where he is currently pursuing the Dr. Techn. degree in the field of ultra-sensitive optical receivers. **CHRISTOPH GASSER** received the B.Sc. degree in electrical engineering and information technology and the Dipl.-Ing. degree in embedded systems from TU Wien, Vienna, Austria, in 2018 and 2020, respectively, where he is currently pursuing the Dr. Techn. degree with the Institute of Electrodynamics, Microwave, and Circuit Engineering. His research interests include highly sensitive optical receivers and analog integrated circuits. **KERSTIN SCHNEIDER-HORNSTEIN** was born in St. Pölten, Austria. She received the Dipl.Ing. and Dr. Techn. degrees from Vienna University of Technology, Vienna, Austria, in 2000 and 2004, respectively. Since 2001, she has been with the Institute of Electrodynamics, Microwave, and Circuit Engineering, Vienna University of Technology. She is the author of the Springer book *Highly Sensitive Optical Receivers* and the co-author of the IOP Book *Single-Photon* Detection for Data Communication and Quantum Systems, and the author and co-author of more than 70 journal and conference papers. Her major research interests include optoelectronics, photonic-electronic integration, and integrated circuit design. HORST ZIMMERMANN (Senior Member, IEEE) received the Dr.-Ing. degree, in 1991. He was then Alexander-von-Humboldt Research Fellow with Duke University, Durham, NC, USA, working on diffusion in Si, GaAs, and InP. In 1993, he joined Kiel University, working on optoelectronic integration. Since 2000, he has been a Full Professor of circuit engineering with TU Wien, working on (Bi)CMOS analog and optoelectronic full-custom integrated circuits. He is the author of two Springer books, the co-author of five more Springer books, the author of an IOP book, the co-author of another IOP book, and the co-author of more than 600 publications on integrated photodiodes and integrated circuits. VOLUME 13, 2025 154609 • • •