Health Monitoring of IGBTs with a Rule-Based Sub-Safety Recognition Model Using Neural Networks

IGBTs are used everywhere but it is the most fragile device in power electronics. This paper develops a rule-based sub-safety recognition model using neural network to evaluate the degradation degree of the IGBTs and determine the health state. Firstly, FMMEA is applied to analyze and extract the precursor parameters of IGBTs. A standardized Euclidean distance is applied to the SOMNN instead of Euclidean distance that can also be a health indicator of the degraded (aged) IGBTs. Secondly, a rule-based improved sub-safety recognition model using neural networks is developed to assess the degradation degree and health state. Finally, several groups of IGBT accelerating tests are completed to train and test the models. The recognition accuracy can be as high as 95% that can verify the accuracy and validity of the algorithm.


Introduction
The safety and reliability of IGBTs are highly required in transportation fields, aerospace and so on. Due to the fact that they experience various harsh environmental and operational conditions for long periods of time, the reliability and safety must be adequately addressed [1]. Yang et al. [2] conducted a questionnaire survey collecting reliability information from power electronic industries and showed the distribution of fragile components (see Fig. 1). As depicted in Fig. 1, they found that "semiconductor power device" was selected by 31% of the respondents as the most fragile component of power converters, followed by "capacitors" and "gate drives." If the health of IGBTs is not identified properly and timely monitored, they will degrade much more severely and get into the failure state which can cause system breakdown or even catastrophic accidents. To address the aforementioned issue, it is urgently required to implement an approach to estimate their health state, which can result in cost benefits by the avoidance of unexpected system breakdown and unscheduled maintenance [3]. In general, health estimation can be partitioned into the following three approaches: reliabilitybased, model-based and data-driven [4]. Nowadays data-driven is the most widely used approach among them. The data-driven approaches are designed to transforming in-situ measurement data into meaningful information to describe the behavior of systems (or products；components). Indeed, the degradation model is derived by exploiting data that is provided by the monitoring system, without considering analytical models or physical parameters. The data-driven approaches mainly use artificial intelligent techniques such as neural networks, Bayesian networks, Markov process, or statistical methods to learn the degradation phenomenon [5] and to predict the future health state of the given systems. Yao et al. studied the thermal safety and lifetime consumption of IGBT based on junction temperature monitoring [6]. Chen et al. have researched the failure analysis of IGBT solder layer based on transient thermal resistance [7]. Fu et al. used real-time rain-flow method and its application in IGBT life prediction [8]. Alghassi et al. [9] used a time delay neural network to predict the remaining useful life (RUL) of the IGBTs. Tamilselvan et al. [10] presented a multi-attribute classification fusion system for IGBT health diagnostics, which needs a great deal of computational complexity that limits its use in online applications. Sutrisno [11] classified IGBT health into healthy and faulty state using a k-nearest neighbor.
As mentioned, most of the research has focused on anomaly detection RUL prediction of the IGBTs. Unfortunately, few papers have conducted research on sub-safety recognition of the IGBTs, defined as the IGBT's early health state of its failure, even if it is significant to keep the IGBTs in a safety state and ensure system reliability. Therefore, this study mainly focuses on subsafety recognition of the IGBTs. More specifically on failure modes, mechanisms, and effects analysis (FMMEA) is used to identify parameters to be monitored for IGBT's health monitoring. Further, the self-organizing map neural network (SOMNN) [12], a type of artificial neural network that is trained using unsupervised learning to produce a two-dimensional discretized representation of the input space of the training samples, is used as a black-box model to learn the IGBT's degradation behavior and segregate the measurement data in-situ into possible health conditions (i.e., safety, sub-safety, and failure). That is, the health state of the IGBTs can be clustered into different part of the competition layer neurons and the degradation degree can be yielded by calculating the minimum quantization error (MQE) using healthy and aging data. Since the state of the IGBTs is not directly recognized by SOMNN, a rule-based model is proposed in this study to address the aforementioned issue.
The rest of the paper is organized as follows. Section II introduces the definition of sub-safety recognition. In Section III, parameters will be introduced for monitoring IGBT health which is determined by FMMEA. Additionally, the proposed IGBT health monitoring approach is presented in Section III. In Section IV, detailed description of the IGBT health monitoring methodology is provided. In Section V, validation test of the developed approach is conducted. Finally, Section VI present conclusions; discussion and possible directions for future research.

Definition of Sub-safety Recognition
In the field of medicine, sub-health is defined as a state between health and disease [13]. In the sub-health state, patients cannot reach health standards; they are not ill but appear to have decreased mental vitality and adaptability. If this state is not corrected, it could lead to psychosomatic illness. Based on this concept, in view of the complex integration of electrical and mechanical systems in high-speed trains, this study uses the "sub-safety" concept. As the name suggests, sub-safety is a degraded state of a system or a device that lies between safety and failure. If the health or degradation degree cannot be assessed and identified early enough, then, with the deterioration of external conditions, the given system will gradually fall into a failed state. This may lead to system breakdown and accidents, resulting in a significant loss of life and property.
The concept of sub-safety can be illustrated in Fig. 2. The three classes represent the three states of the system are Ω1, Ω2, and Ω3. The classes Ω1, Ω2, and Ω3, correspond to the system in safety state, sub-safety state, and failure state, respectively. Each class has a geometrical feature space, composed of a set of observations projected into a space of N-features (Fig. 2) [14]. For IGBTs, the feature vector can consist of the character parameters such as the case temperature, the on-state collector emitter voltage and current, the breakdown voltage, the threshold voltage, the switching on and off time, the thermal resistance and so on. But some of the parameters do not have an obvious effect on the IGBT state. On the other hand, some parameters can reflect the degradation of the IGBTs, which can also be the health indicators. With the increase or decrease of the parameters, IGBTs will degrade gradually with a slow pace, and transfer from one state to another state. For example, when the on-state collector emitter voltage shifts a little, the IGBTs may get into the sub-safety state. If it shifts a lot, the IGBTs may get into the failure state. The transition from one state to another state is determined by the precursor parameters of IGBTs. FMMEA will be done in Section III.A to extract the precursor parameters as the input of self-organize map neural network (SOMNN).

Failure Modes, Mechanisms and Effects Analysis for IGBTs' Sub-safety Recognition Scheme
In Section III.A, the IGBTs' main failure modes, mechanisms, and effects are analyzed first. Then the IGBTs' specific failure in the power cycling test is considered and the main precursor parameters are extracted to use them as the input of the SOMNN. In Section III.B, a sub-safety recognition scheme for the IGBTs is introduced. Additionally, the degradation degree will be determined and used with a rule-based model for sub-safety recognition of the IGBTs.

FMMEA
An IGBT is a combination of a MOSFET and a bipolar junction transistor (BJT). The switching characteristics of IGBTs are similar to MOSFETs, while their high current and voltage capabilities are similar to BJTs. The failures are generally due to environmental conditions (e.g., high temperatures) and operating conditions (e.g., thermal and electrical stresses) [1]. There are six main failure areas in an IGBT power module, two are die attach-related and the other four are package-related [15]. The die-related failures are chip metallization damage and solder fatigue; the package-related failures are bond wire lift-off, bond wire fatigue, partial discharge, and direct copper bonding failure.
FMMEA is a PoF-based methodology for assessing the root causes of failure and failure mechanisms of a given product, illustrated in Fig. 3 [16]. For IGBTs, the potential failure modes, causes, and mechanisms need to be analyzed firstly. Then the precursor parameters can be determined based on these failure modes and mechanisms in the IGBTs' power cycling test. Since the IGBT module is very complicated, it is difficult to analyze all the failure mechanisms and evaluate the module's health condition. As above-mentioned, FMMEA results in the potential die-level failure causes the failure mechanisms, as presented in Table 1 [3], involving high electric fields. Oxide breakdown also involves high temperature except high electric fields. The IGBTs are therefore subjected to electrical and thermal stresses to degrade and the precursors are identified through failure modes and mechanisms. In this study, the analysis is limited to die-level failures. The failure modes of the IGBTs can be short circuits, increase leakage current, or loss of gate control (inability to turn off). The failure causes can be due to environmental conditions such as high temperatures or operating conditions such as high voltages and currents. Critical transistor failure mechanisms include hot electrons and gate-oxide breakdown which can result in excessive leakage current leading to increased standby power and an increase in transistor response time. The effects of these failure mechanisms can be observed from the range of the precursor parameters (i.e., VCE, collector-emitter voltage, ICE, collector-emitter current, and Vth, threshold voltage) [17], as stated in Section I.

A sub-safety recognition scheme for IGBTs
The evaluation result of the IGBTs is one of the following three degradation states: safety (no need to maintenance), sub-safety (the component can still work, but a warning should be provided), and failure (latch-up as a result of thermal overstress). The information available for the development of the monitoring system is made by the measurements of N precursor parameters performed on M different IGBTs, characterized by different levels of degradation. If the N parameters are measured from the m th IGBT (m = 1, 2, …, M , at time t) , it can be indicated by the vector formed by the precursor parameters .
SOMNN is trained using in-situ measurement data from healthy and aging IGBTs. The trained SOMNN provides a two-dimensional representation of the training data, which minimizes the influence of outliers and noisy data and learns the characteristic behaviors of the healthy components. Then a health indicator is proposed based on the minimum quantization error (MQE) and the degradation degree can be determined by the output of SOMNN. Finally, the degradation state will be further recognized by employing the rule-based model.

Health Evaluation and Sub-safety Recognition Approach
As illustrated in Fig. 5, the SOMNN of the two-dimensional grid combines an input layer with a competitive layer of processing neurons [18]. The SOMNN uses a neighborhood function to preserve the topological properties of the input space and determine the closest unit distance to the input vector, which will be used to construct class boundaries graphically on a two-dimensional map. The weight vectors of the best matching unit and its topological neighbors will move closer to the input vector space [19].

SOMNN: layer configuration
For a given network, the input vector x (i.e., x1, x2,…,xn) has a fixed dimension n and they are connected to each neuron in the array [18]. The competitive layer should contain neurons. When the SOMNN is used to cluster the IGBT degradation data, the dimensionality of the input vector n should be three because the three precursor parameters (i.e., VCE, ICE, and Vth) are used in this study.
After the SOMNN being trained, each input vector will be mapped to a competitive layer node. In addition, as the network is trained by the healthy samples, the degradation and failure data which are different from the training data will be clustered to other neuron nodes. The degradation degree d is the normalization of network's outputs which is between 0 and 1. There are no consistent standards in the failure thresholds for IGBTs. It is different for different types and also determined by different applications. In this study, the relationship between the degradation degree and the health state can be pictorially presented in Fig. 6, which is further mapped as Eq. The relationship between the degradation degree and the state can also be mapped as (1), in which is the recognition result for IGBTs, (1).

Training
The SOMNN's training steps [20] are illustrated as follows:  Step 1: As the input vector is , where x1, x2, and x3 are VCE, ICE, and Vth, respectively, the dimensionality of each input is three. is the weight vector between the i th output neuron and the j th input neuron, where i = 1, 2, …, n and j = 1 , 2, …, m and t is the current time.  Step 2: Random values are used to initialize weights, and then input vector and weights are normalized as follows: , ,  Step 3: As input vector and weight vectors have been normalized, the minimum Euclidean distance can be worked out: The neurons that get the minimum Euclidean distance will be the winning neurons.  Step 4: For the neurons in the winning neuron topological neighborhood, the Kohonen rule is applied to update the weights: .
Different distance functions can be used to determine the neighborhood. The commonly used are Euclidean distance, Box distance, Link distance and Manhattan distance. Ƞ is the variable learning rate (0 < Ƞ < 1) and it will gradually come to zero over time, which is shown as follows: , where t is the current iteration times, T is the entire iteration time.

Degradation degree
Three dimensional IGBT aging feature data (VCE, ICE, and Vth) is firstly used as the input of the SOMNN to identify the degradation state of the test samples. Then, the input vector and the corresponding best matching unit (i.e., winning node) 's Euclidean distance is computed. Finally, the MQE is used as a degradation indicator [9]. The MQE indicates how much the vector is different from the training samples of the SOMNN. Thus, the bigger the MQE, the more the IGBT is degraded. , where s1, s2, and s3 correspond to the standard deviation of x1, x2, and x3 for the training data, respectively.

Experimental Results for IGBTs' Health Monitoring
In this study, power cycling aging tests were performed on discrete IGBTs (IRGB4045DPbF) manufactured by Infineon Technologies. The IGBT devices were packaged in a TO-220-3 package along with a soft recovery diode. The devices were rated for a collector-emitter voltage of 600 V and gate-emitter voltage of 20 V. The maximum junction temperature rating was 175 °C.
For IGBT sub-safety recognition, 33 IGBTs were used in the power cycling test; 11 healthy IGBTs were used as healthy reference (i.e., a training set for the SOMNN) and the remaining 22 IGBTs were actually used for the accelerated power cycling test. The mean temperature, Tmean, in the experiment was set to 175 °C (i.e., the maximum junction temperature). In this study, the 22 IGBTs were further divided into two IGBT groups depending on the range of temperature used in the accelerated power cycling test, as specified in Table 2.  to 200 ºC while group B ranges from 125 ºC to 225 ºC. Thus it is possible to compare and validate the developed health monitoring method in different conditions. Moreover, IGBTs' lifetime in group B is much shorter compared with group A as for the higher temperature. Additionally, the precursor parameters (i.e., VCE, ICE, and Vth) measured at the 15 th hour were used to recognize the IGBT's health state. As seen from the Fig.7 and 8, the 15 th hour is a time boundary that can partition IGBTs into safety, sub-safety, and failure, as depicted in Figs. 7 and 8.  The trends of VCE, ICE, and Vth can be seen from Figs. 9 to 11 obtained at the 15 th hour after the accelerated power cycling test for the 22 IGBTs, respectively. It's found that from Figs. 9 to 11 that voltage-related parameters (i.e., VCE and Vth) increased compared to the parts before aging, whereas the current-related parameter (i.e., ICE) decreased. Additionally, all the parmaters measured from the IGBTs in the group B showed more significant deviations at 15 th hour aging compared with the group A.   To determine the neuron nodes of the competitive layer, different map size of the competitive layer ( , , … ) were tested. Accuracy is defined as the proportion of the correctly identified samples from total test samples. As shown in Fig.12, recognition accuracy is improved when the map size increases. However, no significant differences can be observed when the size increases from to , . Unfortunately, the recognition accuracy will decrease when the sizes are larger than . Therefore, the size of (i.e., 9 neurons) is selected for the neuron nodes of the competitive layer.  Table 3 presents a summary of the health state of the IGBTs in group A, where the degradation degree corresponds to the output of the SOMNN. In Table 3, IGBT3 and IGBT5 are still in the safety state, while IGBT2, IGBT4, and IGBT8 are in the failure state. All the others are in the subsafety state. The key failure mechanism during the accelerated power cycling aging test was the thermomechanical fatigue of the lead-free solder die-attach. More specifically, the void in the die-attach was a significant indication of the degradation of the die-attach. To find and validate the degradation, X-ray analysis was performed during the aging period in order to observe the voids' change in the die attach, as depicted in Figs. 13 to 15. In Figs. 13 to 15, the larger die attach on the left side (see the red-solid line in Figs. 13 to 15) corresponds to the IGBT, and the smaller dieattach on the right side corresponds to the free-wheeling diode. Fig.13 is the X-ray image of a healthy IGBT5, which has very few voids and is still in the safety state at the 15th hour. However, for the aged IGBT7 in Fig.14, there are few more voids comparing with the healthy IGBT5, and IGBT7 was observed to cause die-attach degradation in the die attach at the 15th hour. Moreover, it is observed to have much more voids and damage in the IGBT4 die attach, which is totally in a failure state as illustrated in Fig.15 in comparison to the other two states.   As shown in Fig.16, all the "d" of IGBT4, IGBT5, IGBT7 range from 0 to 1. However, the degradation speed is different from each other: IGBT4 is the fastest, and IGBT5 is the slowest, IGBT7 is the medium. This is also corresponding to their lifetime in Fig.7. Compared with Table  3, the degradation degrees of IGBT4, IGBT5, IGBT7 are nearly the same as the results in Fig.16.     The detail of the recognition results of the IGBTs can be seen in Fig.20, which corresponds with the Table 3 and Table 4. To validate the robust of our methods, 220 aged IGBTs are tested and the recognition results at the 15th hour are shown in TABLE 5. And the recognition accuracy can be calculated as the following:

Conclusions
IGBTs are the primary source of failure in power converters, which are widely used in industry applications. As a result, safety control measures are needed to prevent a system from entering a failure state. Hence, the IGBT's health monitoring is critical since it can diagnose earlier before the failure. A rule-based sub-safety recognition model using the output of the SOMNN was developed in this study to recognize the state of the IGBTs. IGBTs' aging data and the X-ray images from the power cycling test were used to validate the model. The recognition results showed that the state of IGBTs could be recognized accurately.
The main work can be summarized as follows: 1) The concept of "sub-safety" has been proposed for IGBTs for the first time.
2) Based on FMMEA, the precursor parameters of the IGBTs were extracted.
3) A standardized Euclidean distance is applied to the SOMNN instead of Euclidean distance which can also be a health indicator of the degraded (aged) IGBTs. 4) A rule-based sub-safety recognition model using the output of the SOMNN is used to evaluate the health state.
For the next step, the other damage and failure modes of IGBTs such as open circuits, short circuits, leakage current increases, loss of control also need to be studied. Furthermore, the failure mechanisms including solder fatigue, wire bond and wire flexure fatigue, hot electrons, and time dependent dielectric breakdown still need to be researched in different environmental conditions such as thermal, electrical, vibration stress and humidity. The state of health needs to be researched considering all the stresses and loading conditions. Feature extraction and information fusion methods can be applied on IGBT modules in order to work out a health indicator with all the precursor parameters considered. The artificial intelligent and machine learning methods are also promising research direction in prognostics and health management of the IGBTs.

Acknowledgements
The paper was funded by Guangzhou science and technology plan general project 'IGBTs' PHM 27 47 135 100% 95% 30 50 140 Accuracy + + = × = + + research', Equipment pre-research project (41402050202), and also funded by 2017 Intelligent manufacturing integrated standardization project and the members of the Prognostics & Health Management Consortium, CALCE, University of Maryland, College Park, USA, supported this work.