# **DesignCon 2020**

# Machine Learning Applications for COM Based Simulation of 112Gbs Systems

Alex Manukovsky, Intel <u>Alex.manukovsky@intel.com</u>

Yuriy Shlepnev, Simberian Inc. <u>shlepnev@simberian.com</u>

Zurab Khasidashvili, Intel zurab.khasidashvili@intel.com

Eli Zalianski, Intel eli.zalianski@intel.com

# Abstract

The vastness of the SerDes system design space makes an expertise-based analysis, aiming to maximize performance by optimizing the design, prohibitively complex. This paper describes a systematic approach for the design space exploration of 112Gb SerDes systems based on Channel Operating Margin (COM) simulation methodology, through the application of machine learning (ML) methods for advanced system analysis. First, the solution space is mapped, and multiple channel models are generated with EM simulator corresponding to the cases of interest. Then an investigation of system level performance, covering various channel topologies, is conducted using COM methodology. Finally, we perform an ML-based design exploration, identifying the root-cause of the failures in the design, and an insight on how to optimize the design is provided. This method allows a methodical, automated analysis of the solution space, yielding the desirable insight on system behavior comprehendible for engineers, and can be used as a decision support tool for design choices in the hands of the system architect, Si designer, Si Engineer, and more.

## Author(s) Biography

Alex Manukovsky is a Technical lead of the Signal & Power Integrity team at Intel Networking Division, responsible for the development of indoor link simulator for high speed serial links, combining both traditional methods of frequency and time domain simulation along with machine learning capabilities. Alex focuses on simulation to lab correlation for high speed serial links for PCIe and Ethernet technologies. His past work focused on channel modeling, robust de-embedding and calibration techniques for VNA and TDR. His experience includes developing test equipment for compliance testing of serial I/O's as well as lab measurement methodologies for volume testing and Si/Pi simulations. Alex joined Intel in 2010 after receiving his BSc in Electrical Engineering from the Technion – Israel Institute of Technology. In 2019 he received his Master's degree in System Engineering from the Technion – Israel Institute of Technology.

**Yuriy Shlepnev** is President and Founder of Simberian Inc., where he develops Simbeor electromagnetic signal integrity software. He received a M.S. degree in radio engineering from Novosibirsk State Technical University in 1983, and a Ph.D. degree in computational electromagnetics from Siberian State University of Telecommunications and Informatics in 1990. He was the principal developer of electromagnetic simulator for Eagleware Corporation and a leading developer of electromagnetic software for the simulation of signal and power distribution networks at Mentor Graphics. The results of his research are published in multiple papers and conference proceedings.

**Zurab Khasidashvili** is a senior software engineer at Intel Corporation. He graduated from Tbilisi State University in Georgia in 1985 and obtained his PhD in logic from the same institution in 1991. From 1992 to 1998 Zurab worked as a postdoctoral fellow in several institutions: INRIA in Paris, the University of East Anglia in Norwich, and the NTT Research Labs in Atsugi, Japan. After moving to Israel in 1998, he worked at Ben-Gurion and Bar-Ilan Universities, and in 1999 joined Intel's Formal Technologies Group in Haifa. Before joining Intel, Zurab worked in term rewriting and lambda calculus. At Intel he has worked in multiple areas of formal verification, including equivalence checking of hardware, model checking, SAT-, SMT- and EPR-based decision procedures, ATPG, and STE. More recently, Zurab worked on verification of HW/FW/SW protocols and on security validation. Currently Zurab is working on applying big data analytics to HVM yield and to Electrical Validation. Zurab is the author of over 60 publications and patents. He has also served as a technical program committee and organizing committee member for a number of conferences and symposia.

**Eli Zalianski** is an Electrical Validation engineer developing test methodology. He is responsible for HSIO PHY interconnect testing, silicon bring-up, and characterization. Eli specialized in 100Gbps/25Gbps PAM4/NRZ Ethernet protocols, PCIe Gen4, and DDR4/LPDDR4 JEDEC compliance testing. Eli's experience with the channel operating margin (COM) methodology and machine learning algorithms was used to produce the data for this paper. He joined Intel in 2015 and received his BSc in Electrical Engineering from the Technion – Israel Institute of Technology in 2017.

# **1. Practical tools for SerDes design space** exploration

Design space exploration plays an extremely important role for SerDes system design, since design teams usually face a complex landing zone for SerDes products. This is due to (1) a large number of customer use cases to be addressed, (2) multiple constraints in the solution space owing to multi-protocol support, and (3) a high degree of system variation to be covered in order to enable proper operation of the system in various configurations. This challenge holds true for both Si products and hard IP SerDes design cases.

It is often the case for the SerDes systems that a single device needs to support multiple application modes and will target several markets all at once – NIC (network interface card), servers, switches, etc. Hence the device is required to properly operate within different channel topologies unique to each market segment, keeping in mind the variety of potential system design implementations of the end customer. Moreover, a given SerDes system will usually support multiple usage modes, short/medium/long reach communication over a variety of media types such as direct point to point channels on PCB or package, communication over backplanes and various cabling solutions of a variety of lengths: copper cables and direct attach or fiber channels. As a result, the same device is required to comply with multiple standard specifications according to its landing zone.

Even when considering a single case of the target system, the variation between channels of the same design due to implementational constrains might have a significant impact on the system performance. The same holds true for the manufacturing variation of the multiple channel components. The manufacturing variation of package and PCB transmission lines can result from material properties, copper roughness, transmission lines geometry, via stubs, etc. Additional performance variation results from connectors and cables manufacturing tolerances and SI IOs PVT impact.

To cope with this large solution space, multiple equalization (EQ) mechanisms and various configurations and their capabilities need to be explored, such as the number and range of FFE taps at the Tx, the number of DFE sliding taps for the Rx, CTLE characteristics, etc., in order to maintain an adequate system performance over the entire solution space. Moreover, the EQ mechanisms must be optimized with respect to their ability to support the target solution space, factoring all the costs, the performance, and the design aspects of the final product.

It becomes obvious that a systematic approach is required to address this challenge. In this paper we demonstrate a practical application of Machine Learning (ML) based methods for advanced design space exploration.

First, the solution space is mapped along with its multiple constraints, and multiple channel models, corresponding to the cases of interest required to cover this space, are generated with an EM simulator. Then, an investigation of the system level performance is conducted, covering channel topologies for various market segments, variation within the same segment, variation within the same design, and manufacturing tolerances. In this work IEEE 802.3 STD Channel Operating Margin (COM) methodology is used, which enables an evaluation of overall system performance as well as channel quality when

used with a standard transmitter and receiver with configurable equalization capabilities. This method allows the channel designers to gain insight into their expected product quality without the need for proprietary simulators or detailed information regarding their device. Finally, we perform a design/system exploration as follows: given a response variable (an output of the design/system), we find the parameters (features, in ML terminology) having the greatest effect on the response. Moreover, we look for combinations (conjunctions) of ranges of numeric features and values of nominal features having the greatest effect on the response variable. To explore the design, some of the main questions we answer using the ML techniques above include the following: (a) If the response variable does not satisfy the spec, by having values outside the designated ranges, what are the parameter combinations accounting for that? (b) What are the feature ranges for which, for most samples/tests, the response is well outside the required range? First, the root-cause of the failures in the design (failure to comply with a spec or a standard) is identified, and then an insight on how to optimize the design is provided. This method is implemented on a 112Gb system case study.

This work tackles four main challenges of practical design space exploration of Ethernet (ETH) systems:

- 1. Generating a large quantity of link models to cover the solution space
- 2. Evaluating the performance of a large quantity of links and system configuration
- 3. Methodically analyzing the large volume of results
- 4. Enabling an automated ML based decision support procedure to cope with system complexity and decisions based on dig data

While the design space definition remains an expert choice, the above four challenges could be solved in an automated process, as this paper will demonstrate, while providing practical examples of 112Gb C2C link.

# 2. Range Analysis for Decision Support Applications

The ML approach that we use for design exploration is *Feature Range Analysis*, or *Range Analysis* (RA), for short [1]. Range Analysis is an algorithm resembling *Rule Learning* (RL) [2], *Rule Induction* (RI) [3], and *Subgroup Discovery* (SD) [4]. From the algorithmic perspective, the main distinguishing feature of RA is that it heavily employs *Feature Selection* [5] in two basic building blocks of the algorithm: the **ranking** and **basis** procedures. The third basic building block in RA, the procedure called **quality**, is like the technique used in RL, RA and SD, where the selection of rules or subgroups is done solely based on a *quality function* (or based on multiple quality functions). These three procedures will be explained below.

The purpose of Range Analysis is to identify combinations of *ranges* of numeric (or continuous) features and *levels* of nominal (or categorical) features that explain positive samples – samples whose characteristics and the behavior we want to explore in the data. Binary (or dichotomous) features are a special case of nominal features with two levels, 0 and 1. For binary responses O\_bin, it is conventional to encode the value of positive samples as 1 and value of negative samples as 0. For numeric responses O\_num, there is no definition of positive and negative samples, however one might be interested in

finding ranges where the values in the response are in its 'high range' or in its 'low range'. A high range or a low range in the response values is not defined in general via a specific threshold value. When there is a threshold k for the response values that can distinguish between high values and the rest (or between low values and the rest), it is often convenient to model numeric responses as binary by applying transformation O\_bin = O\_num > k (or O\_bin = O\_num < k, respectively). In this work we do not consider nominal responses with more than two levels as this slightly more general case can easily be reduced to (multiple instances of) the binary case. For simplicity, we will assume that there is only one response variable O in each analysis.

The RA algorithm generates *range features* that are most relevant for the response, where 'most relevant' might mean (a) having a strong correlation or high mutual information with the response, based on one or more correlation measures; (b) explaining part of the variability in the response not explained by the strongest correlating features; or (c) maximizing a *quality function*. Important examples of quality functions include the ones listed below, where Pos and N denote the counts of positive and all samples in the entire dataset, respectively, p0 = Pos/N, R denotes a range, and n(R) denotes the count of all samples within R:

- *True Positive Rate* (also known as *sensitivity*, *recall*, or *hit rate*): TPR(R) = TP(R)/Pos, where TP(R) denotes the count of true positive samples, that is, positive samples within the range R.
- *Predictive Positive Value* (also known as *precision*): PPV(R) = TP(R)/n(R)
- The *lift*: Lift(R) = PPV(R)/p0
- *Weighted Relative Accuracy* [6]: WRAcc(R) = (n(R)/N)\*(PPV-p0)

For numeric responses, the counterpart of PPV(R) is the mean value of the response on samples within R, and the counterpart of p0 is the mean value of the response on all samples, thus Lift(R) and WRAcc(R) also make sense for numeric responses [7]. While positive and negative samples only make sense for binary responses, the concepts like *True Positive (TP), True Negative (TN), False Positive* (FP) and *False Negative* (FN) can be generalized to numeric responses as well, and this way all the concepts of the quality functions that are based on these concepts are generalized to numeric responses [1]. Each range feature is a binary feature where the value 1 on a sample is interpreted as "the sample is within the range", and the value 0 is interpreted as "the sample is outside the range".

The RA algorithm works as follows:

- The RA algorithm first ranks features highly correlated to the response; this can be done using an *Ensemble Feature Selection* procedure, we refer to it as **ranking** procedure. In addition, RA uses the *Maximal Relevance Minimal Redundancy* (MRMR) procedure to select a subset of features which both strongly correlate to the response and provide a good coverage of the entire variability in the response, we refer to this procedure as **basis**.
- 2. For the nominal features selected in the first stage, or optionally, for all nominal features, from each level a binary range feature is generated thru a one-hot encoding. In a similar way, a fresh binary feature is generated for each selected numeric feature and each constructed range. These features are called *single*-

*range features*. Note that an important part of all the above-mentioned algorithms (RL, RI, SD, RA) is to define candidate ranges of numeric features. RL, RI and SD generate non-overlapping ranges while in RA the candidate ranges can be overlapping. This helps to significantly improve the accuracy of RA compared to RL, RI, SD. The RA algorithm then applies **ranking** and **basis** procedures to select the most relevant single ranges; in addition, RA selects single range features that maximize one or more *quality functions*. We refer to the quality-function based selection of ranges as **quality**.

- 3. For each pair of selected single range features associated with different original features, RA generates *range-pair features* which have value 1 on each sample where both the component single-range features have value 1 and have value 0 on the remaining samples. RA then applies again **ranking**, **basis** and **quality** procedures to select the most relevant range pairs.
- 4. Similarly, from the selected single ranges and selected range pairs, the RA algorithm builds *range triplets*, and applies **ranking**, **basis** and **quality** procedures to select most relevant ones.

In the implementation of Range Analysis in Intel's ML tool EVA [1, 8,9,10], for practical considerations the dimensionality of the range features is limited to three (single ranges, range pairs, and range triplets). The ML experiments reported in this paper are performed with EVA.

As an example, let's consider a plot of a range pair composed of two nominal features, for a binary response shown in Fig. 2.1.



Fig. 2.1 Example of range analysis plot for important feature pairs.

The feature on the X-axis has more levels than the feature on the Y-axis. Each small rectangular-shaped cell corresponds to a pair of levels of the two features, and the number in each cell shows the count of samples with these levels in the two features. Gray cells contain no samples (in the analyzed data set ). The color of each cell indicates a ratio of positive vs negative samples in the cell, calculated as a *Normalized Lift*, as indicated by the legend: the red color indicates a high ratio of positive vs negative samples, and the green color indicates a high ratio of negative vs positives samples within the cell; the normalized Lift is computed as NormLift(R)=PPV/(PPV+p0), where

PPV=TP(R)/n(R) and p0=Pos/N (as defined earlier). The selected range in the plot is marked with a blue rectangle, and the corresponding ratio of positive vs negative samples within the range is marked on the legend with the blue line. One can see two cells (level pairs) with a relatively high ratio; the selected cell has a lower ratio compared to the second cell, but is chosen because it contains many more samples; indeed, some of the quality functions, e.g., Weighted Relative Accuracy, favor ranges with higher sample counts.

Both the features and the responses can be systematically explored providing an answer to the following:

- What are the important features that impact the system behavior the most?
- What are the ranges of each of the important features in which we are most likely to achieve desired system response?
- What are the combinations of feature ranges that enable us to achieve desired system response?

Simply put, we could systematically Identify:

- What are the system characteristics required for achieving good performance
- What are the system characteristics required for achieving excellent performance
- What are the system characteristics responsible for bad performance

This analysis can be performed for complex systems with a large amount of system variables and complex output behavior, while bad, good or excellent can be determined by the specification, an expert opinion, or relative system performance.

Applying Range Analysis to design space exploration of system performance, factoring in a variety of operating conditions, controlled and uncontrolled factors, and multiple system configurations, allows a methodical, automated analysis of the solution space. This analysis provides a feasible way to handle the complexity of Ethernet systems, yielding the desirable insight on system behavior comprehendible for engineers, and can be used as a decision support tool for design choices in the hands of the system architect, Si designer, Si Engineer, and more.

## **3.COM as a quality metric for 112 Gb links**

The goal is to evaluate the performance of an Ethernet system over a large solution space for multiple channels, system configurations, various choices of equalization mechanisms with various capabilities, a variety of package choices, process voltage (PVT), Si characteristics, etc. The IEEE COM tool was selected for this task since its analysis is a part of the industry standards for recent Ethernet protocol specifications, and it allows simulating the various system configurations of interest with a sufficient computational speed, and is thus applicable for dealing with the required high volume of simulations for such an analysis.

Channel Operating Margin [11-13] is a signal to noise ratio defined as

$$COM = 20 \cdot \log\left(\frac{A_{Signal}}{A_{Noise}}\right)$$
 (3.1)

Where  $A_{Signal}$  is the peak signal and  $A_{Noise}$  is the peak Bit Error Rate (BER) noise defined through the peak signal minus the peak BER eye opening. Signal in this context includes all losses and dispersion in the link from chip to chip and the effect of equalization. Noise in this context includes all possible signal degradation effects with some assumptions. It includes return loss, reflections and couplings (crosstalk) as well as equalization by Tx and Rx. COM metric is computed in the time domain as the voltage ratio of signal available in a reference signaling architecture (Tx and Rx) to noise at the reference receiver's sampler – basically, it characterizes the complete link from chip to chip. The noise is calculated for the specified Detector Error Ratio (DER). DER is a generalization of BER for NRZ and SER for PAM4. Equalized single bit or symbol response (SBR or SSR) and major signal degradation factors are used to calculate the vertical slice of the eye diagram centered at the sampling point where DER is minimal. For IEEE802.3bj, bm and ck (C2C) the reference architecture is a 3 tap Tx FFE, Rx CTLE, a DFE whose number taps vary, optional reference packages, and filters [11].

COM is a simulation of a reference transmitter and receiver system with a baseline equalization capability. It serves as a common reference for chip design and board design. The COM parameters represent the expected capability of a realizable PHY design. Channels that meet COM requirements are expected to work with compliant PHYs with the specified BER or better. Thus, **COM is a practical metric to make decisions on materials selection, package construction, PCB construction, and transitions design.** COM can be used to budget between loss, reflections, coupling, and noise, supporting a wide range of platform configurations. Though, it may not be obvious how different features affect the COM and how to identify the ranges of features within which the design will work with high confidence. We use the machine learning algorithm to help with those decisions in this paper.

In order to demonstrate a realistic application of Range Analysis based decision support tool for Ethernet systems design space exploration, we chose the 112Gb Chip to Chip link example. The chosen system has just enough complexity to require a decision support tool to gain a comprehensive insight on the one hand, but on the other hand could be understood by an experienced SI engineer in order to validate the findings of the demonstrated method. The link under investigation is illustrated in Fig. 3.1.



Fig. 3.1. Link under investigation: TP0 to TP5 are locations of ports for analysis with the reference package; TPa to TPb are locations of ports for analysis with the custom package model.

To compute S-parameters of the link we use signal integrity analysis automation kit (MLKit) based on Simbeor SDK with API to Matlab for either the complete link or for the PCB section only. We use a reference Matlab script from IEEE 802.3ck task force [11] with most of the parameters fixed to the reference values, unless defined otherwise. The input to the COM algorithm is a collection of 4 port s-parameters (s4p files). Each

link consists of the channel, represented by an s4p file (TROUGH), and all the relevant crosstalk s4p aggressor files (FEXT and NEXT). The channel is modeled from pin to pin or from the BGA pad to BGA pad - this includes both BGA escapes and DC blocking caps (TP0 to TP5 as shown in Fig. 3.1). The end-to-end channel COM is computed from a reference transmitter pad to the sampler input in the receiver. All computed S-parameters should also be suitable for the time domain conversion for 112 Gb PAM4 link, considering bandwidth and sampling requirements (covered in the next chapter). The other input parameters of the COM script are defined as follows:

#### For Transmitter (Tx):

- Voltage swing (Av=Afext=0.41V, Anext=0.6V)
- Nonlinearity (R\_LM=0.95)
- Rise/Fall Time (Tr=6.16ps)
- Jitter (sigma RJ=0.01UI and A DD=0.02UI)
- Noise (SNR\_TX=32.5dB)
- FFE tap ranges plus limits as follows:

| c(-1) | [-0.3 : 0.05 : 0]     |
|-------|-----------------------|
| c(-2) | [-0.15 : 0.05 : 0.12] |
| c(-3) | [-0.06 : 0.05 : 0]    |
| c(-4) | [0]                   |
| c(1)  | [-0.2 : 0.05 : 0]     |
| N_b   | 1                     |

#### For Receiver (Rx) :

- One-sided noise spectral density ( $\eta 0=8.2e-9V^{2}/GHz$ )
- Bandwidth limiting filter (f\_r=0.75\*f\_b)
- Reference CTLE (g\_DC, f\_z, f\_p1, f\_p2) and noise filter (g\_DC\_HP, f\_HP\_PZ) defined as follows:

| g_DC    | [-20 : 1 : 0] | dB  |
|---------|---------------|-----|
| f_z     | 21.25         | GHz |
| f_p1    | 21.25         | GHz |
| f_p2    | 53.125        | GHz |
| g_DC_HP | [-6 : 1 : 0]  |     |
| f_HP_PZ | 0.6640625     | GHz |

- Number of ideal DFE taps plus limits (N\_b=1, b\_max(1)=0.75)
- Detector error ratio (DER\_0=1.0e-5)
- Single-ended termination resistor (R d=45)

The output of the COM script is the COM number defined by (3.1) that can be ranked as follows:

- Compliant (Good ) channel characteristics (COM > 3dB)
- Non Compliant (Bad) channel characteristics (COM < 3dB)
- Excellent channel characteristics (COM > 4dB)

In addition to COM, the quality of a link can be evaluated by Effective Return Loss (ERL). Similarly to how COM is calculated, ERL is given by the ratio of the signal

amplitude to the amount of eye closure caused only by reflections, defined with respect to DER. However, this is beyond the scope of this work, since its addition is not essential to demonstrate the discussed process of design space exploration.

# 4. De-compositional analysis of 112 Gb links

To cover the design space, multiple link models are generated. To do so, we use a hybrid de-compositional electromagnetic analysis, or the 1D+3D technique [14]. Decomposition of a simple link is illustrated in Fig. 4.1. The link is partitioned into discontinuities and transmission line segments and uses package models defined in COM script. Chip-to-chip (C2C) link decomposition with the custom package models is shown in Fig. 4.2.

1D models are built as solutions of Telegrapher's equations. 3D models are built with solution of full-wave Maxwell's equations. Both models are used to compute S-parameters of a complete link. Modal or per unit length parameters for the Telegrapher's equations (Z, Y) are computed with static or quasi-static field solver (2D problems for Laplace's equations) or an electromagnetic fields solver (3D problems for Maxwell's equations). Not only straight single line segments, but also lines with coupling, multimodal waveguides, periodic structures (BGA breakout routing) can be accurately modeled with this approach.

1D+3D Hybrid de-compositional analysis with transmission line models for traces (1D) and 3D models for discontinuities or transitions is the best technique for the serial interconnects under the localization condition, both for the design exploration and post-layout analysis. This approach usually works for PCB and packaging problems with relatively long traces, but may fail if trace segments are too short – complete 3D analysis of adjacent discontinuities is required in this case. A differential transmission line segment can be used as a very simplistic model of a link with a possible coupling to other differential links for preliminary investigations.



Fig. 4.1. Decomposition of a simple link into 1D transmission line segment models with parameter extracted with 2D quasi-static field solver and optional 3D model built with electromagnetic solver.



Fig. 4.2. Decomposition of chip-to-chip (C2C) link into 1D transmission line segments and 3D discontinuity models, including package t-lines and discontinuities and possible coupling on PCB.

The accuracy of the 1D+3D approach it depends on some conditions. First, proper localization of every single transition in the link is required. It is relatively difficult to do on PCB for the bandwidth of 112 Gb signals. Possible breakout of localization at very high frequencies will be neglected in this investigation. Proper de-embedding of 3D discontinuities is required to avoid artificial reflections on the boundaries between 1D and 3D models. The accuracy also depends on the availability of broadband dielectric and conductor roughness models. Such models can be identified with GMS-parameters or SPP Light techniques. Broadband dielectric models can be constructed with data available from manufacturer. However, conductor roughness models require the identification of realistic conductor roughness parameters. These parameters vary corresponding to Cu foil type and treatment procedures used for PCB manufacturing process for a specific stackup case. However, several groups of similar roughness characteristics can be identified as common for the ultra-high speed market segment. We will use statistical conductor roughness parameters previously identified in [15]. Additional necessary conditions for the accuracy of the 1D+3D approach are discussed in [16] and are not relevant to this investigation.

## 5. 112 Gb links modeling: channel features and signal degradation factors

To assess the validity of the results obtained by our method, an understanding of the considered link model behavior is required, yielding an insight into the parameters affecting the link performance and the ranges in which these parameters vary.

There are three major groups of signal degradation factors to model for PCB and packaging interconnects: thermal losses and dispersion, reflections and couplings.

**Thermal losses** include dielectric polarization loss and dispersion and also conductor + conductor surface roughness loss and dispersion. We call it thermal loss, because the useful energy of the signal is dissipated in dielectric and conductor as heat. Causal

Wideband Model (Djordjevic-Sarkar) defined with two parameters (Dk and LT) at one frequency point is used to model the dielectric. The range of losses considered here is from ultra-low loss dielectric with loss tangent LT=0.001 to a medium-loss dielectric with LT=0.01. For conductor roughness modeling we use causal Huray-Bracken model with two parameters, SR and RF, identified previously in [15]. The simplest model for HVLP copper were described with SR=0.14um and RF=8.7 or with SR=0.075um, RF=24.5 – both models provide sufficient accuracy up to 50 GHz as was demonstrated in [15]. The range of the losses effect is illustrated in Fig. 5.1. The focus of this investigation is on how the thermal losses affect the link performance.



Fig. 5.1. Attenuation in dB/m for the range of dielectric and conductor surface treatment used in this investigation.

**Reflections** are the second group of the signal degradation factors included in this investigation. Trace impedance mismatch and single discontinuities (package bumps and balls, via transitions, AC caps ...) cause reflections and resonances due to multiple reflections. As a result, some energy of the signal will be reflected back to the transmitter (return loss) and some energy will propagate to the receiver with multiple reflections on the way and cause additional signal degradation due to the dispersion of the insertion loss and phase delay (usually called ISI).

**Both thermal losses and reflections due to the impedance mismatch** are defined by the material properties and by the geometry of the transmission line segments that define the link. Striplines are usually used for the high-speed links. Features affecting practically all electrical properties of a single stripline segment are illustrated in Fig. 5.2.

Considering variabilities in PCB or package manufacturing, we will use target impedance Ztarget to define the geometry of the cross-section with approximately 5% and 10% deviation from the target value.

#### Stripline segment (PKG and PCB)



Select possible laminates – defines Ha, Hb, Dk, LT, RR, SR, fix EF;

Set S for strong or weak coupling; Compute W for about 10% and 5% deviation from <u>Ztarget</u>

| Feature | Description                                                         |
|---------|---------------------------------------------------------------------|
| На      | Height of dielectric layer above strips                             |
| Hb      | Height of dielectric layer below strips                             |
| т       | Strip thickness                                                     |
| W       | Strip width in the middle of strip layer                            |
| S       | Spacing between strips (middle)                                     |
| EF      | Strip etch factor – EF=0.5( <u>Wtop-Wbot</u> )/T                    |
| RR      | Conductor resistivity relative to annealed<br>copper                |
| SR      | Surface roughness in um for <u>Huray</u> -Bracken roughness model   |
| RF      | Surface roughness factor for <u>Huray</u> -Bracken roughness model  |
| Dk      | Dielectric constant at 1 GHz for Wideband<br>Debye dielectric model |
| LT      | Loss tangent at 1 GHz for Wideband Debye dielectric model           |
| Length  | Differential strip line length                                      |

Fig. 5.2. Features affecting loss, dispersion and reflections for a differential stripline segment.

The discontinuities that can cause the signal degradation are shown in Fig. 4.2. There are two major discontinuities in the package – transition from bumps to stripline and transition from the stripline to BGA balls. The last model may include the transition from package balls to PCB stripline (BGA-PCB vias). The other possible discontinuities on PCB are PCB vias and AC coupling capacitors.



Fig. 5.3. Models for three possible transitions from BGA to differential stripline. Blue – highly optimized structure, orange – optimized structure, red – not optimized structure.



Fig. 5.4. Models for three possible via-hole transitions from strip to microstrip traces.



Fig. 5.5. Models for three possible configurations for AC coupling capacitors.

For this investigation we created S-parameter models for each discontinuity in the link based on realistic implementations. Next, the structures were optimized with respect to DFM design for manufacturing constraints of common PCB manufacturing process, as it is assumed that a significant optimization effort will be considered for realistic 112Gb link. The simplest model of a possible link will be just stripline segment with the reference package transmission line and capacitive discontinuities for the bumps and balls. Reflections from planar transitions such as bends, transitions from one crosssection to another are neglected in this investigation.

Examples of 3D discontinuity models are shown in Fig. 5.3 - Fig. 5.5 with various geometry optimization levels of the same structure – not optimized, optimized and highly optimized.

**Couplings** are the third group of the signal degradation factors. It includes a very broad range of physical effects listed in Table 5.1, which can be further separated into leaks (useful signal energy loss) and interference (unwanted energy added to the signal).

| Coupling type                                                                                   | Model                                                                    |
|-------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| Crosstalk – leaks and interference in parallel traces                                           | INCLUDE                                                                  |
| Via localization breakout – leaks and interference and through parallel planes and between vias | Localize <u>vias</u> up to 40-50 GHz and<br>do not simulate via coupling |
| Couplings through slots and cutouts in reference planes                                         | Prohibit in layout                                                       |
| Modal transformations in diff. pairs (aka skew) – bends,<br>asymmetry in routing, FWE           | Mitigate with proper length<br>compensation – do not simulate            |
| Multipath propagation, radiation, EMI, EMC,                                                     | Suppress with localization – do not simulate                             |

| Table 5.1. | Coupling types | and modeling | or possible | mitigation. |
|------------|----------------|--------------|-------------|-------------|
|------------|----------------|--------------|-------------|-------------|

For the crosstalk coupling investigation effect, a simplified model shown in Fig. 5.6 with the features defined in the table can be used. A cross-section of the differential stripline segments in such a link is defined as shown in Fig. 5.2. This is in addition to the separation parameter Spp and lengths of the coupled and un-coupled segments. For simplicity, lengths of the un-coupled segments can be set to zero. Coupled segments of the more realistic C2C link shown in Fig. 4.2 are defined with Spp and lengths. The simplest link (Fig. 5.2), link with coupling (Fig. 5.6) and C2C cases are all included into Simbeor MLKit.



Fig. 5.6. A simplified model for the investigation of a link with crosstalk.

The output of the decompositional model for the simplified and realistic C2C structures is the complete s8p model illustrated in Fig. 5.6. It is used to derive s4p models for the victim link (THROUGH – IO1 to IO3), far-end crosstalk aggressor to victim Sparameters (FEXT – IO2 to IO3) and near end crosstalk aggressor to victim S-parameters (NEXT – IO4 to IO3). All IOs here are just pairs of ports terminated by specified termination resistor. Note that the THROUGH model will include the losses from leaks to the terminated aggressor link (near and far end leaks). The port numeration used here is [1 2 3 4] as illustrated in Fig. 5.2. Model building for transmission lines, some discontinuities and complete link analysis is automated in Matlab MLKit with Simbeor SDK.

We should note that the key enabling technology for design space exploration is the capability to quickly and automatically construct realistic link models.

### 6. Simple 112 Gb link case study results

To demonstrate the viability of the proposed analysis and to gain an initial confidence in the result of the proposed method, a case of a simple link is examined. The simplest link is just a segment of differential stripline on PCB with all features defined in Fig. 5.2 and the reference package model as defined in IEEE 802.03ck specifications. The package model for the Tx and Rx will have capacitance  $C_d=120$  fF,  $C_p=70$  fF, package Zc=92.5 $\Omega$ , package\_tl\_ $\tau=6.14$  ps/mm, package\_tl\_ $\gamma 0_a 1_a 2=[0 \ 0.0009909 \ 0.0002772]$ . Packages with 1mm, 5mm, 12mm and 31mm transmission line segments are investigated. The number of the features in the model shown in Fig. 5.2 is further reduced to just six with the values defined in Table 6.1.

|    | Feature                   |       |       |       |       |     |    |
|----|---------------------------|-------|-------|-------|-------|-----|----|
| 1  | Tx_PCB_TL_S (S/H)         | 1     | 2     | 3     |       |     |    |
| 2  | Tx_PCB_TL_L (Length) [in] | 0.5   | 1     | 3     | 6     | 9   | 12 |
| 3a | PCB_Dk (Dk)               | 2.8   | 3.2   | 3.5   | 3.8   |     |    |
| 3b | PCB_LT (LT)               | 0.001 | 0.002 | 0.004 | 0.009 |     |    |
| 4  | PCB_TL_H (Ha,Hb) [mil]    | 3     | 5     | 8     | 10    |     |    |
| 5  | PCB_Imp [Ω]               | 85    | 90    | 95    | 100   | 105 |    |
| 6  | PKG_Len, [mm]             | 1     | 5     | 12    | 31    |     |    |

Table 6.1. DOE table for the simplest stripline link investigation.

The dielectric material is defined simultaneously with two parameters, PCB\_Dk and PCB\_LT. That choice corresponds to a usual practical selection between high-end materials with extremely low losses and a relatively low dielectric constant, and medium-loss materials with a higher dielectric constant. The PCB\_TL\_H feature corresponds to Ha=Hb in Fig. 5.2. Half ounce copper (T=0.7mil) with the conductor surface roughness defined with SR=0.075um and RF=24.5 is used [15]. Tx\_PCB\_TL\_S is the separation between strips defined as the multiple of PCB\_TL\_H. Each analysis starts with the synthesis of strip width (W) for a given impedance PCB\_Imp defined for 5 cases. The feature Tx\_PCB\_TL\_L is the link length in inches. The total number of cases covered by Table 6.1. is 5760 and the range of total PCB link losses is illustrated in Fig. 6.1. We further separated all cases into two groups – with very short package (1mm and 5mm case) and with the reference package line length (12mm and 31mm).



Fig. 6.1. Range of PCB channel losses in investigated links.

The range of the PCB channel losses in all links is illustrated in Fig. 6.1. It includes practically lossless extremely short 0.5in links with high-end dielectric on one end (top brown line) and 12in link with medium dielectric losses on the other end (bottom red line). This demonstrates the variety of channels considered for this analysis.

First, we evaluate the performance of the described system with package length of 12 and 31 mm representing a medium to long package length in the ETH market over the entire population of reperestetive PCB channels defined by the DOE. The question is what are the system characteristics required for achieving excellent performance, COM>4.

The results of the Range Analysis with EVA are shown in Tables 6.2, 6.3, 6.4 and 6.5. In these and other similar tables, the column "selection" specifies one or more methods that selected that original feature or range. The selection value "correlation" corresponds to the procedure called **ranking** in the Range Analysis algorithm as described in Section 2. Selection value "coverage" corresponds to the procedure **basis** of Range Analysis algorithm, and selection value "target" corresponds to selection based on the procedure called **quality** in the Range Analysis algorithm. These names – correlation, coverage, and target -- were chosen to reflect the intuition behind the respective procedures, for the users who are not experts in Machine Learning and are not familiar with the details of the Range Analysis algorithm.

The first step would be to establish what are the important features having the greatest impact on the system behavior the most, and to rank them according to importance. The future ranking based on correlation or by coverage is presented in Table 6.2. It is in good agreement with feature importance ranking based on range analysis of a single feature maximizing a *quality function of Max lift* presented in Table 6.3.

Table 6.2. Important single-range features based on having a strong correlation or high mutual information with the response (selection method :correlation ) or by explaining part of the variability in the response not explained by the strongest correlating features (selection method: coverage )

| feature_1     | score  | selection            |
|---------------|--------|----------------------|
| Pkg_len_RX    | 1.0000 | correlation-coverage |
| s4p_Tx_PCB_L  | 0.3949 | correlation-coverage |
| s4p_PCB_Imp   | 0.2309 | correlation-coverage |
| s4p_PCB_Dk    | 0.0859 | correlation-coverage |
| s4p_PCB_H     | 0.0529 | correlation-coverage |
| s4p_Tx_PCB_DS | 0.0079 | coverage             |
| s4p_Tx_PCB_S  | 0.0011 | coverage             |

Table 6.3. Important single-range features for system with package length of 12 and 31 mm.

|          | feature       | max score | max lift |
|----------|---------------|-----------|----------|
| <u>0</u> | Pkg_len_RX    | 1.0000    | 2.001159 |
| 1        | s4p_Tx_PCB_L  | 0.3490    | 1.653923 |
| 2        | s4p_PCB_Imp   | 0.1999    | 1.292459 |
| <u>3</u> | s4p_PCB_H     | 0.0284    | 1.072431 |
| <u>4</u> | s4p_Tx_PCB_DS | 0.0008    | 1.001757 |
| <u>5</u> | s4p_Tx_PCB_S  | 0.0011    | 1.001120 |

Surprisingly, the most important single range feature is not the PCB channel length, as comonly thought, but the package length. The package length is also the most important feature in the pairs of features (Table 6.4) and in the triplets of the features (Table 6.5). PCB link length and impedance and thickness of dielectric importance are also rated very high – we will examine such non trivial findings in depth later on. It can be noticed that based on single feature range analysis considering the package length feature alone, a range of package lengths can be defined in which the probability of having a "excellent" performance (COM>4) is over double than that in the overall population, since the max lift score for this feature is 2.001159. Furthermore, as we investigate the range defined by two features simultaneously (Table 6.4), the max lift score for the pair of features identified as the most imoprtant in our system, package length and PCB channel length, is higher compared to the single feature defined range. This trend further continues for the triple feature defined range, demonstrated in Table 6.5. It can be seen that the triplet of characteristics identified to have the strongest impact on the performance consists of: (1) package length, (2) PCB channel length and (3) PCB channel impedance. Having certain values of this triplet of features will increase the chance for "excellent" performance (COM>4) by more than 3.8 times.

|          | feature1     | feature2     | max score | max lift |
|----------|--------------|--------------|-----------|----------|
| <u>0</u> | Pkg_len_RX   | s4p_Tx_PCB_L | 1.0000    | 3.307847 |
| 1        | Pkg_len_RX   | s4p_PCB_Imp  | 0.8773    | 2.586416 |
| 2        | Pkg_len_RX   | s4p_PCB_H    | 0.4797    | 2.136974 |
| <u>3</u> | Pkg_len_RX   | s4p_PCB_Dk   | 0.6936    | 2.131448 |
| 4        | s4p_Tx_PCB_L | s4p_PCB_Imp  | 0.3344    | 1.453289 |

Table 6.4. Important range-pair features for system with package length of 12 and 31 mm.

Table 6.5. Important range triplet features for system with package length of 12 and 31 mm.

|          | feature1     | feature2     | feature3    | max score | max lift |
|----------|--------------|--------------|-------------|-----------|----------|
| <u>0</u> | s4p_PCB_Imp  | s4p_Tx_PCB_L | Pkg_len_RX  | 0.8263    | 3.835184 |
| 1        | Pkg_len_RX   | s4p_Tx_PCB_L | s4p_PCB_Imp | 0.9639    | 3.835184 |
| 2        | s4p_PCB_Dk   | s4p_Tx_PCB_L | Pkg_len_RX  | 0.8480    | 3.216147 |
| <u>3</u> | Pkg_len_RX   | s4p_Tx_PCB_L | s4p_PCB_Dk  | 0.7398    | 2.951053 |
| <u>4</u> | s4p_Tx_PCB_L | s4p_PCB_Imp  | Pkg_len_RX  | 1.0000    | 2.907210 |
| <u>5</u> | s4p_PCB_Dk   | s4p_PCB_Imp  | Pkg_len_RX  | 0.7385    | 2.805366 |
| <u>6</u> | s4p_PCB_H    | s4p_Tx_PCB_L | Pkg_len_RX  | 0.8333    | 2.766596 |
| <u>7</u> | Pkg_len_RX   | s4p_PCB_Imp  | s4p_PCB_Dk  | 0.6248    | 2.262003 |

Next, the same type of analysis is perforemend on the 1 and 5 mm case packages and the results of range analysis for single and triplet features are displayed in Tables 6.6, 6.7. Surprisingly, the package length is no longer an important feature (this will be examined later). Moreover, the max lift scores are relatively low, and at first glance it seems that our analysis has failed and is unsuccessful in identifying the important system characteristics. Hoewer, a more in depth examination (as will be presented later on in this section) reveals than in this case, most of the configurations have COM >4 and qualify as having an "excellent" performance. As a result, most of the characteristics will satisfy the performance requirement, and no "unique" properties are required, meaning having a preferred characteristic will only slightly improve the odds of having "excellent" performance.

|          | feature       | max score | max lift |
|----------|---------------|-----------|----------|
| <u>0</u> | s4p_Tx_PCB_L  | 1.0000    | 1.653923 |
| 1        | s4p_PCB_Imp   | 0.5292    | 1.292459 |
| <u>2</u> | s4p_PCB_Dk    | 0.1917    | 1.126741 |
| <u>3</u> | s4p_PCB_H     | 0.0660    | 1.072431 |
| 4        | s4p_Tx_PCB_DS | 0.0019    | 1.001757 |
| <u>5</u> | s4p_Tx_PCB_S  | 0.0024    | 1.001120 |

Table 6.6. Important single-range features for system with package length of 1 and 5 mm.

|           | feature1     | feature2     | feature3      | max score | max lift |
|-----------|--------------|--------------|---------------|-----------|----------|
| <u>0</u>  | s4p_Tx_PCB_L | s4p_PCB_Imp  | s4p_PCB_Dk    | 0.4146    | 1.917592 |
| 1         | s4p_Tx_PCB_L | s4p_PCB_Imp  | s4p_PCB_H     | 0.3714    | 1.917592 |
| 2         | s4p_Tx_PCB_L | s4p_PCB_Imp  | s4p_PCB_Dk    | 1.0000    | 1.917592 |
| <u>3</u>  | s4p_Tx_PCB_L | s4p_PCB_Imp  | s4p_Tx_PCB_S  | 0.1956    | 1.797743 |
| <u>4</u>  | s4p_Tx_PCB_L | s4p_PCB_Dk   | s4p_PCB_H     | 0.5625    | 1.718731 |
| <u>5</u>  | s4p_PCB_Imp  | s4p_Tx_PCB_L | s4p_PCB_Dk    | 0.9301    | 1.668179 |
| <u>6</u>  | s4p_PCB_Imp  | s4p_Tx_PCB_L | s4p_PCB_H     | 0.8936    | 1.648301 |
| 7         | s4p_Tx_PCB_L | s4p_PCB_Imp  | s4p_PCB_H     | 0.9479    | 1.543401 |
| <u>8</u>  | s4p_Tx_PCB_S | s4p_PCB_Imp  | s4p_Tx_PCB_DS | 0.0922    | 1.278395 |
| <u>9</u>  | s4p_PCB_Imp  | s4p_PCB_Dk   | s4p_PCB_H     | 0.2440    | 1.223649 |
| <u>10</u> | s4p_Tx_PCB_S | s4p_PCB_Dk   | s4p_Tx_PCB_DS | 0.0194    | 1.054676 |

Table 6.7. Important range triplet features for system with package length of 1 and 5 mm.

In this case, to get an insight on system performance, a different question needs to be examined: what are the system characteristics responsible for bad performance, COM<3. Such cases are expected to have some distinguished characteristics that seperate them from the general population. The results of the Range Analysis with EVA considering what types of systems should be avoided are shown in Tables 6.8, 6.9.

Table 6.8. Systems to avoid (COM<3): Important single-range features for system with package length of 1 and 5 mm.

|          | feature       | max score | max lift |
|----------|---------------|-----------|----------|
| <u>0</u> | s4p_Tx_PCB_L  | 1.0000    | 2.626174 |
| 1        | Pkg_len_RX    | 0.3192    | 1.471736 |
| 2        | s4p_PCB_Imp   | 0.1406    | 1.371255 |
| <u>3</u> | s4p_PCB_Dk    | 0.0564    | 1.094238 |
| 4        | s4p_PCB_H     | 0.0476    | 1.079895 |
| <u>5</u> | s4p_Tx_PCB_S  | 0.0088    | 1.010703 |
| <u>6</u> | s4p_Tx_PCB_DS | 0.0041    | 1.009949 |

Table 6.9. Systems to avoid (COM<3): Important range triplet features for system with package length of 1 and 5 mm.

|          | feature1     | feature2     | feature3   | max score | max lift |
|----------|--------------|--------------|------------|-----------|----------|
| <u>0</u> | s4p_Tx_PCB_L | s4p_PCB_Dk   | Pkg_len_RX | 1.0000    | 4.329620 |
| 1        | s4p_Tx_PCB_L | s4p_PCB_H    | Pkg_len_RX | 0.8922    | 4.190758 |
| 2        | s4p_Tx_PCB_L | s4p_PCB_Imp  | Pkg_len_RX | 0.9155    | 3.875588 |
| <u>3</u> | s4p_Tx_PCB_L | s4p_Tx_PCB_S | Pkg_len_RX | 0.7841    | 3.863166 |
| 4        | s4p_Tx_PCB_L | s4p_PCB_Imp  | s4p_PCB_H  | 0.7859    | 3.467138 |
| <u>5</u> | s4p_Tx_PCB_L | s4p_PCB_Imp  | s4p_PCB_Dk | 0.8488    | 3.141867 |
| <u>6</u> | s4p_Tx_PCB_L | s4p_PCB_H    | s4p_PCB_Dk | 0.7878    | 3.115134 |
| 7        | s4p_Tx_PCB_L | s4p_PCB_Dk   | s4p_PCB_H  | 0.6252    | 2.780420 |
| <u>8</u> | s4p_Tx_PCB_L | s4p_PCB_Imp  | s4p_PCB_H  | 0.5220    | 2.133623 |

It can be noticed, that in this case, the ranking based on a single feature and a triplet offeatures are very similar to the ranking of excellent systems with 12 and 31 mm package cases. This means that the same characteristics are important for the system operation.

Next, the findings of the proposed analysis are examined in detail and their validity is evaluated. All COM results are plotted in Fig. 6.2 and Fig. 6.3 as a function of the total link loss at the Nyquist frequency (including loss in package). Fig. 6.2 compares COMs for two very short packages with length 1mm and 5mm and Fig. 6.3 compares COMs for two reference cases of packages with length 12mm and 31mm. Graphs on the left show all data for each package case in the same color. Graphs on the right show additional information about PCB link length coded with colors - from blue for the shortest 0.5in link to red for the longest 12in link. First, we can observe that the shortest 1mm package provides the best performance – almost no failure cases. The 1mm case has only a few failures when total link losses exceed 40dB, which is expected. The 5mm package fails for very lossy links and very short links. A further increase of the package length to 12 mm makes things much worse – almost all cases fail. However, the longer 31mm package improves the situation. Note that 5mm is just a little smaller than the wavelength at the Nyquist frequency – we can expect resonances between the discontinuities when the package size becomes a multiple of half of the wavelength in the package. The presence of two strong discontinuities in the Rx package – bumps and balls explains the signal degradation for the packages with relatively small lengths that exceed half of the wavelength in the package. Transmission lines in the package are relatively lossy and a further increase of the package length helps to dump the resonances, as is clearly visible on the Single Symbol Response (SSR) shown in Fig. 6.4.



Fig. 6.2. COM vs link total loss at Nyquist frequency for two cases with very short packages (both graphs). Right graph shows PCB link length in color from blue (0.5in) to red (12in).

The effect of PCB link length is further illustrated in Fig. 6.5. It shows COM vs total loss at the Nyquist frequency separately for 6 PCB link lengths. We can see that too short and too long links fail, while links in the middle have much better performance.



Fig. 6.3. COM vs link total loss at Nyquist frequency for two reference package lengths (both graphs). Right graph shows PCB link length in color from blue (0.5in) to red (12in).



Fig. 6.4. Impulse response for 12mm package (left) and 31mm package (right).



Fig. 6.5. COM vs link total loss at Nyquist frequency for two reference package lengths (12mm and 31mm) and different PCB link length.

The impact of the dielectric material selection (dielectric constant and loss tangent) is illustrated in Fig. 6.6. The picture shows 8 graphs – 4 graphs at the top for the shorter 12mm package and 4 graphs at the bottom for the longer 31mm package. Each graph is for a different PCB dielectric choise. The selection of better dielectrics does not help at all and makes things worse for the shorter package lengths. Also the dielectric selection does not matter even for the longer package as long as the total link losses are below 35dB. We can also observe that dielectrics with more losses help to mitigate failures for very short PCB links.

Yet another factor affecting the total losses is the dielectric thickness – it defines the trace width for the target impedance. Traces will have more losses with thinner dielectrics. The dependence of COM on the total loss at the Nyquist frequency is shown in Fig. 6.7. The picture shows 8 graphs – 4 graphs at the top for the shorter 12mm package and 4 graphs at the bottom for the longer 31mm package. Each graph is for a different thickness of the dielectric. We can observe that the thinner dielectrics help to reduce failure for the shortest PCB links, while the thickest dielectric helps to transmit the signal over longer PCB traces. This is an expected behaviour.



Fig. 6.6. COM vs link total loss for two reference package lengths (12mm and 31mm), different dielectrics (Dk&LT) and PCB link length (from blue 0.5in to red 12in).



Fig. 6.7. COM vs link total loss for two reference package lengths (12mm and 31mm), different dielectric thickness (H) and PCB link length (from blue 0.5in to red 12in).

In addition to the electrical and mechanical parameters of the dielectric, our numerical experiment included typical PCB production impedance variations. Graphs for COM vs the total losses at the Nyquist frequency are shown in Fig. 6.8 for two reference package length. The left graph shows additional information about the impedance variations coded with colors from blue for  $85\Omega$  to red form  $105\Omega$ . Graph on the right show ERL metric in colors. We can conclude that the lower impedance values were beneficial for the link performance.



Fig. 6.8. COM vs link total loss at Nyquist frequency for two reference package lengths (12mm and 31mm, both graphs). Left graph shows PCB trace impedance in color from blue (85 $\Omega$ ) to red (105 $\Omega$ ). Right graph shows ERL coded in colors.

The design space can now be systematically explored using the proposed method. We illustrate the same process for more realistic links in the next sections.

# 7. Realistic 112 Gb link with PCB crosstalk and discontinuities

The realistic link is shown in Fig. 4.2. It contains two identical reference package models (Rx and Tx) and PCB link with viahole transition from BGA to strip line, viahole transition from stripline to microstrip and AC coupling capacitors, model of AC coupling capacitors, viahole transition from microstrip back to strip line and viahole transition from strip line to BGA. Models for the discontinuities are shown in Fig. 5.3 - 5.5. Only the best-case discontinuities (blue lines on the curves in Fig. 5.3-5.5) are used in this case study (typical case after optimization). All features and values are shown in Table 7.1. The total number of links is 25920.

|    | Feature                   |       |       |       |       |     |    |
|----|---------------------------|-------|-------|-------|-------|-----|----|
| 1  | Tx_PCB_TL_S (S/H)         | 1     | 2     | 3     |       |     |    |
| 2  | Tx_PCB_TL_L (Length) [in] | 0.5   | 1     | 3     | 6     | 9   | 12 |
| 3a | PCB_Dk                    | 2.8   | 3.2   | 3.5   | 3.8   |     |    |
| 3b | PCB_LT                    | 0.001 | 0.002 | 0.004 | 0.009 |     |    |
| 4  | PCB_TL_H (Ha, Hb) [mil]   | 3     | 5     | 8     | 10    |     |    |
| 5  | PCB_Imp [Ω]               | 85    | 90    | 95    | 100   | 105 |    |
| 6  | Tx_PCB_TL_DS (Spp) [mil]  | 3     | 5     | 10    |       |     |    |
| 7  | SR [um]                   | 0.075 | 0     |       |       |     |    |
| 8  | Pkg_len_TX [mm]           | 5     | 12    | 31    |       |     |    |

Table 7.1. DOE table for link with crosstalk and discontinuities.

The AC caps are located at 0.8 of the total link length. As in the case of the simple link, the reference model was used to simulate the package with lengths 5, 12 and 31 mm (Pkg\_len\_TX feature). Tx and Rx packages are assumed identical. To include the effect of crosstalk, an additional feature is required, Tx\_PCB\_TL\_DS, defining the separation between differential pairs (pair to pair separation Spp in Fig. 5.6). To investigate the effect of conductor roughness, we introduced a feature for conductor roughness SR with just 2 values – zero for smooth conductor (the best-case scenario) and 0.075 um as in the simplest case (RF=24.5). All other features are defined exactly as in the case of the simple link.

The range of the insertion and reflection losses for a selection of links is illustrated in Fig. 7.1. It includes cases with the lowest and highest losses. The range of crosstalk is illustrated in Fig. 7.2. It includes cases with the lowest and highest crosstalk defined by the link parameters in Table 7.1.



Fig. 7.1. Differential insertion loss (top figure) and reflection loss (bottom figure) for a selection of realistic link cases.



Fig. 7.2. Crosstalk for a selection of realistic link cases.

First, we investigated the realistic link with 1 DFE tap as in the case of the simple link. EVA ranking of the features is shown in Table 7.2. Features for the most relevant single ranges are listed in Table 7.3. We can see that the package length is not the most important single range feature any more as in the simple link case. The pair to pair separation (crosstalk, Tx\_PCB\_DS) and dielectric thickness (PCB\_H) are ranked higher than the package length. Impedance (PCB\_Imp), PCB link length (Tx\_PCB\_L) and dielectric properties (PCB\_Dk/LT) are ranked as less important.

The single range identified by EVA for the differential pair to pair separation is shown in Table 7.4. It basically states that the pair to pair separation ratio to dielectric thickness

(see Table 7.1) greater than or equal to 10 covers almost 1619 cases (Positive In) with COM>3dB. Only 23 cases are out of this range (Positive Out). It also has a relatively small number of cases with COM<3dB (Negative In), comparing to all cases with Tx\_PCB\_DS<10 (Negative Out). The Range Lift of 2.96 also indicates the importance of the pair to pair separation (see the definition of Lift in Section 2). The effect of the separation Tx\_PCB\_DS is further illustrated in Fig. 7.3 – we can see more cases with COM>3dB when the separation is equal to 10 (the lowest crosstalk). Note, that the cross-talk cannot be mitigated with the number of taps. It is a deterministic noise, but the location of signal distortions cannot in general be predicted and mitigated.

| Table | 7.2. A | ll feature | ranking | results | for rea | listic | link | with 1 | DFE tap | ). |
|-------|--------|------------|---------|---------|---------|--------|------|--------|---------|----|
|       |        |            |         |         |         |        |      |        |         |    |

|    | feature_1  | score  | selection            |
|----|------------|--------|----------------------|
| 53 | Tx_PCB_DS  | 1.0000 | correlation-coverage |
| 54 | PCB_H      | 0.6766 | correlation-coverage |
| 55 | PCB_Imp    | 0.1452 | correlation-coverage |
| 56 | Tx_PCB_L   | 0.1392 | correlation-coverage |
| 57 | PCB_Dk     | 0.0839 | correlation          |
| 58 | PCB_LT     | 0.0839 | coverage             |
| 59 | Pkg_len_TX | 0.0508 | coverage             |
| 60 | PCB_SR     | 0.0418 | coverage             |
| 61 | Tx_PCB_S   | 0.0025 | coverage             |

Table 7.3. Important single range features for realistic link with 1 DFE tap.

|   | feature    | max score | max lift |
|---|------------|-----------|----------|
| 0 | Tx_PCB_DS  | 1.0000    | 2.958892 |
| 1 | PCB_H      | 0.6497    | 2.368484 |
| 2 | Pkg_len_TX | 0.2549    | 1.648117 |
| 3 | Tx_PCB_L   | 0.2185    | 1.556218 |
| 4 | PCB_Imp    | 0.1579    | 1.243338 |
| 5 | PCB_Dk     | 0.0735    | 1.118105 |
| 6 | Tx_PCB_S   | 0.0034    | 1.004950 |

| Table 7.4 | Single range | identified f | for differential | nair to | nair sei | naration fe | ature   |
|-----------|--------------|--------------|------------------|---------|----------|-------------|---------|
| 1 4010 /  | ongie range  | identified I | or annerential   | pun to  | pun se   | paranon ic  | ului U. |

|   | Tx_PCB_DS<br>range | score | selection                   | Positive<br>Out | Negative<br>Out | Positive<br>In | Negative<br>In | Range Lift |
|---|--------------------|-------|-----------------------------|-----------------|-----------------|----------------|----------------|------------|
| 1 | 10:Inf             | 1.0   | correlation-coverage-target | 23              | 17251           | 1619           | 7014           | 2.958892   |



Fig. 7.3. COM vs link total loss at Nyquist frequency for 3 differential pair to pair separations (color-coded).

Two important ranges detected by EVA that substantially increase COM are shown in Table 7.5. They are both ranges associated to the dielectric thickness PCB\_H. The range with the thickness below 5 mil covers all positive outcome cases but has also large number of negative outcomes (Negative In). To reduce the number of negative outcomes, the range should be further reduced to 3 mil and below. Fig. 7.4 further illustrates the effect of dielectric thickness with two complementary graphs. We can see that only the cases with 3 and 5 mil dielectrics have COM>3dB. All other cases are below 3dB.

|   | PCB_H<br>range_1 | score  | selection                   | Positive<br>Out | Negative<br>Out | Positive<br>In | Negative<br>In | Range Lift |
|---|------------------|--------|-----------------------------|-----------------|-----------------|----------------|----------------|------------|
| 1 | -Inf:3           | 0.4838 | correlation-coverage-target | 670             | 18762           | 972            | 5503           | 2.368484   |
| 2 | -Inf:5           | 0.6497 | correlation-coverage-target | 0               | 12957           | 1642           | 11308          | 2.000541   |

Table 7.5. Two possible single ranges identified for dielectric thickness feature.



Fig. 7.4. Box plots for COM distributions (left graph) and COM vs link total loss at Nyquist frequency (right graph) for 4 dielectric thicknesses.

Now as we know that the differential pair to pair separation and dielectric thickness are the most important single range features, it is not a surprise to see them together as the most important range pair with the maximal lift as shown in Table 7.6 of EVA results. Three possible ranges for the separation and thickness are shown in Table 7.7. Fig. 7.5 illustrates the outcome. We can see that the larger separations and thinner dielectrics produced more compliant links with COM>3dB. The second most important range pair of features is the pair to pair separation and the package length. The result for the package length is similar to the simple case – COM is better with the longer packages due to the resonances.

|   | feature1         | feature2          | max score | max lift |
|---|------------------|-------------------|-----------|----------|
| 0 | Tx_PCB_DS        | PCB_H             | 1.0000    | 6.941607 |
| 1 | <u>Tx_PCB_DS</u> | <u>Pkg_len_TX</u> | 0.5198    | 4.818835 |
| 2 | Tx_PCB_DS        | Tx_PCB_L          | 0.5969    | 4.667573 |
| 3 | РСВ_Н            | Pkg_len_TX        | 0.3208    | 3.990102 |
| 4 | Tx_PCB_DS        | PCB_Imp           | 0.5365    | 3.850784 |
| 5 | РСВ_Н            | Tx_PCB_L          | 0.4083    | 2.565957 |
| 6 | PCB_H            | PCB_Imp           | 0.3646    | 2.486837 |

Table 7.6. Important range-pair features for realistic link with 1 DFE tap.

Table 7.7. Three possible ranges identified for the first pair of features (differential pair to pair separation and dielectric thickness).

|   | Tx_PCB_DS<br>range_1 | PCB_H<br>range_2 | score  | selection                   | Positive<br>Out | Negative<br>Out | Positive<br>In | Negative<br>In | Pair Lift |
|---|----------------------|------------------|--------|-----------------------------|-----------------|-----------------|----------------|----------------|-----------|
| 1 | 10:Inf               | -Inf:3           | 0.7482 | correlation-coverage-target | 693             | 23057           | 949            | 1208           | 6.941607  |
| 2 | 10:Inf               | -Inf:5           | 1.0000 | correlation-coverage-target | 23              | 21570           | 1619           | 2695           | 5.921213  |
| 3 | 10:Inf               | 5:5              | 0.4388 | coverage-target             | 972             | 22778           | 670            | 1487           | 4.900819  |



Fig. 7.5. Box plots for COM distributions (left graph) and COM vs link total loss at Nyquist frequency (right graph) for 4 dielectric thicknesses (coded with colors) and 3 pair to pair separations.

EVA-selected range triplets are shown in Table 7.8. Again, dielectric thickness, differential pair to pair separation and package length are the most important range triplets. Two possible ranges for the first triplet are shown in Table 7.9. Longer package links produce better COM. However, usually we do not have control over the package link length.

|   | feature1  | feature2   | feature3   | max score | max lift  |
|---|-----------|------------|------------|-----------|-----------|
| 0 | PCB_H     | Pkg_len_TX | Tx_PCB_DS  | 0.8467    | 11.460753 |
| 1 | PCB_H     | Tx_PCB_L   | Tx_PCB_DS  | 1.0000    | 10.386992 |
| 2 | PCB_H     | Tx_PCB_L   | Tx_PCB_DS  | 0.6920    | 9.830896  |
| 3 | PCB_H     | PCB_Imp    | Tx_PCB_DS  | 0.6835    | 7.706031  |
| 4 | PCB_H     | PCB_Imp    | Tx_PCB_DS  | 0.9046    | 7.325148  |
| 5 | PCB_H     | PCB_Dk     | Tx_PCB_DS  | 0.8152    | 6.400852  |
| 6 | Tx_PCB_DS | PCB_Imp    | Pkg_len_TX | 0.5141    | 6.364042  |
| 7 | PCB_H     | PCB_Imp    | Pkg_len_TX | 0.3846    | 4.384389  |
| 8 | PCB_Dk    | Tx_PCB_L   | Tx_PCB_DS  | 0.3914    | 4.239197  |

Table 7.8. Important range triplet features.

Table 7.9. Two possible ranges identified for the first triplet of features (dielectric thickness, package length and differential pair to pair separation).

|   | PCB_H<br>range_1 | Pkg_len_TX<br>range_2 | Tx_PCB_DS<br>range_3 | score  | selection                   | Positive<br>Out | Negative<br>Out | Positive<br>In | Negative<br>In | Triplet Lift |
|---|------------------|-----------------------|----------------------|--------|-----------------------------|-----------------|-----------------|----------------|----------------|--------------|
| 0 | -Inf:3           | 31:Inf                | 10:Inf               | 0.6719 | coverage-target             | 1119            | 24068           | 523            | 197            | 11.460753    |
| 1 | -Inf:5           | 31:Inf                | 10:Inf               | 0.8467 | correlation-coverage-target | 763             | 23706           | 879            | 559            | 9.644372     |

|   | PCB_H<br>range_1 | Tx_PCB_L<br>range_2 | Tx_PCB_DS<br>range_3 | score  | selection                   | Positive<br>Out | Negative<br>Out | Positive<br>In | Negative<br>In | Triplet Lift |
|---|------------------|---------------------|----------------------|--------|-----------------------------|-----------------|-----------------|----------------|----------------|--------------|
| 1 | -Inf:3           | 6:6                 | 10:Inf               | 0.3790 | target                      | 1405            | 24142           | 237            | 123            | 10.386992    |
| 2 | -Inf:3           | 6:9                 | 10:Inf               | 0.5402 | target                      | 1194            | 23994           | 448            | 271            | 9.830896     |
| 3 | -Inf:3           | 3:9                 | 10:Inf               | 0.6920 | coverage-target             | 981             | 23849           | 661            | 416            | 9.683441     |
| 4 | -Inf:5           | 6:6                 | 10:Inf               | 0.5031 | coverage                    | 1216            | 23971           | 426            | 294            | 9.335145     |
| 5 | -Inf:5           | 3:9                 | 10:Inf               | 0.9023 | correlation-coverage        | 512             | 23240           | 1130           | 1025           | 8.273231     |
| 6 | -Inf:5           | 3:Inf               | 10:Inf               | 1.0000 | correlation-coverage-target | 249             | 22784           | 1393           | 1481           | 7.647303     |
| 7 | -Inf:5           | -Inf:9              | 10:Inf               | 0.7863 | target                      | 286             | 22026           | 1356           | 2239           | 5.951203     |

Table 7.10. Possible ranges identified for the second triplet of features (dielectric thickness, PCB link length and differential pair to pair separation).

The next triplet in Table 7.8 has PCB link length in addition to dielectric thickness and strip pairs separation. Those parameters are under control of a PCB developer. Possible ranges are shown in Table 7.10 and further illustrated with the box plots in Fig. 7.6. In addition to already identified ranges for the dielectric thickness and differential pair to pair separation, the optimal range for PCB link length can be identified around 6 inches - from 3 to 9 inches. Short and long links should be excluded for better performance.



Fig. 7.6. Box plots for COM distributions for 4 dielectric thicknesses (columns from 3 to 10), 3 pair to pair separations (rows 3,5,10) and 6 PCB lengths (color coded).

Finally, we investigate how the number of DFE taps can improve the link performance. The same realistic link (25920 cases) was simulated with 5 DFE taps in Rx, optimized in the reference COM script [11]. EVA results for single ranges and range triplets are shown in Tables 7.11 and 7.12. Tables on the left show features ranking for COM>4dB (good link) and tables on the right for COM>3dB (just compliant links). Unsurprisingly, the importance of package length dropped significantly. DFE mitigated the reflections inside the Rx package. Consequently, the most important range triplets include dielectric thickness, differential pair to pair separation, PCB link length, as well as the impedance and dielectric properties (Dk/LT).

| Table 7.11. | Important    | single | range | features  | for realistic | link | with : | 5 DFE | taps | and | target |
|-------------|--------------|--------|-------|-----------|---------------|------|--------|-------|------|-----|--------|
| COM>4dB (   | (left table) | and CC | DM>30 | dB (right | t table).     |      |        |       |      |     |        |

|   | feature    | max score | max lift |   | feature    | max score | max lift |
|---|------------|-----------|----------|---|------------|-----------|----------|
| 0 | Tx_PCB_DS  | 1.0000    | 2.642593 | 0 | Tx_PCB_DS  | 1.0000    | 2.451382 |
| 1 | PCB_H      | 0.7327    | 2.296551 | 1 | PCB_H      | 0.6196    | 2.126231 |
| 2 | PCB_Imp    | 0.1379    | 1.308534 | 2 | PCB_Imp    | 0.1331    | 1.259686 |
| 3 | Tx_PCB_L   | 0.0728    | 1.220842 | 3 | Tx_PCB_L   | 0.1209    | 1.234499 |
| 4 | Pkg_len_TX | 0.0808    | 1.189775 | 4 | PCB_Dk     | 0.0760    | 1.111982 |
| 5 | PCB_Dk     | 0.0642    | 1.062093 | 5 | Pkg_len_TX | 0.0494    | 1.104577 |

Table 7.12. Important range triplet features for realistic link with 5 DFE taps and target COM>4dB (left table) and for COM>3dB (right table).

|   | feature1  | feature2   | feature3  | max score | max lift |   | feature1  | feature2 | feature3  | max score | max lift |
|---|-----------|------------|-----------|-----------|----------|---|-----------|----------|-----------|-----------|----------|
| 0 | PCB_H     | Tx_PCB_L   | Tx_PCB_DS | 0.5804    | 6.094331 | 0 | PCB_H     | Tx_PCB_L | Tx_PCB_DS | 0.3517    | 4.387299 |
| 1 | PCB_H     | Tx_PCB_L   | Tx_PCB_DS | 0.9700    | 6.094331 | 1 | PCB_H     | Tx_PCB_L | Tx_PCB_DS | 1.0000    | 4.387299 |
| 2 | PCB_H     | PCB_Imp    | Tx_PCB_DS | 1.0000    | 5.854508 | 2 | Tx_PCB_DS | Tx_PCB_L | PCB_H     | 0.7074    | 4.375112 |
| 3 | PCB_H     | PCB_Imp    | Tx_PCB_DS | 0.6229    | 5.674640 | 3 | PCB_H     | PCB_Imp  | Tx_PCB_DS | 0.3802    | 4.295897 |
| 4 | PCB_H     | Pkg_len_TX | Tx_PCB_DS | 0.5396    | 5.607631 | 4 | PCB_H     | PCB_Dk   | Tx_PCB_DS | 0.9187    | 4.192308 |
| 5 | PCB_H     | PCB_Dk     | Tx_PCB_DS | 0.9091    | 5.304325 | 5 | PCB_H     | PCB_Imp  | Tx_PCB_DS | 0.9643    | 4.174027 |
| 6 | Tx_PCB_DS | PCB_H      | PCB_SR    | 0.6364    | 5.211217 | 6 | Tx_PCB_DS | PCB_H    | PCB_SR    | 0.6603    | 4.167934 |
| 7 | Tx_PCB_DS | PCB_H      | Tx_PCB_S  | 0.7575    | 5.076493 | 7 | Tx_PCB_DS | PCB_Imp  | PCB_H     | 0.7940    | 3.600790 |
| 8 | PCB_H     | PCB_Dk     | Tx_PCB_DS | 0.8269    | 5.048513 | 8 | PCB_H     | PCB_Dk   | Tx_PCB_DS | 0.8462    | 3.377137 |
| 9 | Tx_PCB_DS | PCB_Imp    | PCB_H     | 0.4033    | 4.292130 | 9 | PCB_Imp   | Tx_PCB_L | PCB_H     | 0.3931    | 2.605951 |

Expertise-based analysis becomes complex when tens or hundreds of features are involved. In such cases identifying important range combinations becomes practically impossible even for the best experts in the domain, and with a manual analysis there is little confidence that no important range combinations have been missed. The machine learning in general and the Range Analysis in particular are available for PCB or package designers who are unfamiliar with the signal integrity at all or are dealing with new technologies that have not yet been deeply studied. It is a formal process, where an initial set of features (length, thickens, dielectric,...) is identified with some knowledge about the features contributing to signal degradation. The rest of the process is a completely automated design exploration with the Machine Learning algorithms. The conclusion on relevant ranges of the features becomes very formal in this case – it does not require expertise or tedious manual simulations.

# 8. Conclusions

Real life design challenges require an enormous amount of parameter value combinations to populate the design space. Simple sweeps of design parameters may be not suitable. Particularly, a systematic approach is required to address SerDes design solution space coverage for multiple equalization mechanisms and various channel configurations affecting the system performance.

We demonstrate a practical application of ML based methods to identify the parameters and combinations thereof having the greatest effect on the design/system output, account for failure to meet the specs/standards, and provide an insight on how to optimize the design to meet a selected performance metric. This method is implemented on a 112Gb system case study.

This method allows a methodical, automated analysis of the solution space, yielding the desirable insight on system behavior comprehendible for engineers, and can be used as a decision support tool for design choices in the hands of the system architect, Si designer, Si Engineer, and more.

# **References:**

[1] Z. Khasidashvili, A. J. Norman. Range Analysis and Applications to Root Causing. In: 6th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019.

[2] J. Furnkranz, D. Gamberger, and N. Lavra<sup>°</sup>c. Foundations of Rule Learning. Cognitive Technologies. Springer, 2012.

[3] P. Clark, T. Niblett. The CN2 induction algorithm, Machine Learning, vol. 3, 1989.[4] S. Wrobel. An algorithm for multi-relational discovery of subgroups. PKDD 1997.

[5] I. Guyon, A. Elisseeff. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 2003.

[6] N. Lavrač, B. Kavsek, P. Flach, L. Todorovski. Subgroup Discovery with CN2-SD. Journal of Machine Learning Research 5, 2004.

[7] M. Atzmueller, F. Lemmerich. Fast Subgroup Discovery for Continuous Target Concepts. Foundations of Intelligent Systems, LNCS 5722, 2009.

[8] C.W. Koay, A.J. Norman, Z. Khasidashvili. Analog Circuit Process Monitoring, IEEE Intl. Workshop on Defects, Adaptive Test, Yield and Data Analysis, 2017.

[9] A. Manukovsky, Y. Juniman, Z. Khasidashvili. A Novel Method of Precision Channel Modeling for High Speed Serial 56Gb Interfaces. DesignCon 2018.

[10] A. Manukovsky, Z. Khasidashvili, A.J. Norman, Y. Juniman, R. Bloch. Machine Learning Applications for Simulation and Modeling of 56 and 112 Gb SerDes Systems. DesignCon 2019.

[11] IEEE P802.3ck Task Force - Tools and Channels

http://www.ieee802.org/3/ck/public/tools/index.html

[12] M. Brown, M. Dudek, A. Healey, E. Kochuparambil, L. Ben-Artsi, R. Mellitz, C. Moore, A. Ran, P. Zivny, "The state of IEEE 802.3bj 100 Gb/s Backplane Ethernet", DesignCon 2014, January 2014, Santa Clara

[13] Measuring Channel Operating Margin – Anritsu app note <u>https://dl.cdn-anritsu.com/en-us/test-measurement/files/Technical-Notes/White-Paper/11410-00989A.pdf</u>

[14] Y. Shlepnev, Decompositional Electromagnetic Analysis of Digital Interconnects, IEEE International Symposium on Electromagnetic Compatibility (EMC13), Denver, CO, 2013, p.563-568.

[15] A. Manukovsky, Y. Shlepnev, Measurement-assisted extraction of PCB interconnect model parameters with fabrication variations, 2019 IEEE 28st Conference on Electrical Performance of Electronic Packaging and Systems, Oct. 6-9, 2019, (EPEPS 2019), October 6-9, 2019, Montreal, Canada.

[16] M. Marin, Y. Shlepnev, Systematic approach to PCB interconnects analysis to measurement validation, 2018 IEEE Symposium on Electromagnetic Compatibility, Signal and Power Integrity, July 30- August 3, 2018, Long Beach Convention Center, Long Beach,