Data-driven reliability and availability of electronic equipment

: Introduction/purpose Reliability and availability are important especially for military, medical, and other professional equipment. Reliability and availability management and/or prognostic reliability calculations have always been data driven. This article focuses on the analysis of the data impact on reliability and availability. Methods: This research is based mostly on the articles published by the author of this work as well as on some other papers. Results: This paper results in a discussion on the definition of the data-driven concept, preceded by brief definitions of reliability and availability, and followed by the analysis of the main impacts of uncertain data on prognostic reliability calculations as well as by reliability and availability of data used in reliability calculations. Conclusions: Reliability and availability are still very important. Reliability and availability have always been data driven while valid and relevant data have always been the main problem. Without good data, prognostic reliability is useless in spite of a good reliability model.


Introduction
Reliability as theory and practice appeared after the Second World War. It was first applied to hardware, then to software and humans. Reliability is still very important nowadays (Pokorni, 2014b). Availability is connected with reliability. Reliability and availability have the same meaning for unrepairable products while availability is more important for products intended to be repaired.
We will use the term 'product' for a device, item, component, system, etc.
Reliability has always been data driven (Pokorni, 2021a(Pokorni, , 2022. Without good data, prognostic reliability is useless in spite of a good reliability model. Therefore, input data in a reliability model and in a decision-making system in reliability management are of crucial importance. The same can be said for availability. How to obtain good data is a big problem. Regarding relaibility, such data can be rarely obtained from producers of components or devices (electronic, mechanical) and/or software. Until the nineties of the last century, data about failure rates of electronic elements were available from the well-known military handbook MIL-HDBH-217 (Military Handbook, 1986). However, the latest version of this handbook is from 1995, so these data are obsolete (Pokorni, 2014b). Since these data are statistical, reliability (i.e., probability) which is calculated based on these data is valid only with a certain probability. Some data can be used from other sources but they are usually not up to date either.
Therefore, there is a question: to calculate reliability or not to calculate it? From this author's experience (of almost 30 years of teaching (talking to students that input data is the biggest problem in reliability calculations) and 40 years of practice in reliability calculations of electronic equipment) it is better to do calculations even based on obsolete data, especially at the beginning of designing a device or a system, because this can help to chose a better solution (alternative) from the point of view of reliability.
Of course, when it is necessary to decide whether such devices, software or systems satisfy certain reliability requirements, obsolete or uncertain data should not be relied upon.
In (Pokorni, 2021b), it is concluded that the problem is how to cope with large amounts of data on the one hand, and with very small amounts of data on the other hand. Both of these can be the case in reliability and maintenance, and more often there is a problem of not enough data or no data at all.
Firstly, the definition of the data-driven concept is given preceded by short definitions of reliability and availability, and after that the impact of uncertain data on prognostic reliability calculations is discussed as well as reliability and availability of data. So, we speak about the availability of a product, and the availability of the data for this product.
Before analysing the impact of data on reliability and availability, we will give a brief definition of reliability and availability. Data-driven reliability and availability of electronic equipment, pp. [769][770][771][772][773][774][775][776][777][778][779][780][781][782] Definition of reliability It is enough for this paper to say that reliability is the probability that a device will meet the intended standards of performance and deliver the desired results within a specified period of time under specified (environmental) conditions (Pokorni, 2021a).
Reliability is very important not only in military and professional products. Reliability nowadays also plays a crucial role in safety and adoption of driverless cars.

Definition of availability
Availability is the probability whether the product is ready to perform its function when it is required (Pokorni, 2014a). Availability can be defined in different ways. It is generally defined as (Pokorni, 2014a):  This is usually called operational availability. If the operational time and the down time of a product are recorded, availability can be calculated. So with carefully recording such data, it is possible to obtain data about availability. It is not easy to get data about reliability.
Calculating availability is important. If there is a so-called Service Level Agreement (SLA), then it must be possible to calculate availability in order to see if that SLA is fulfilled.

Definition of the term 'data driven'
Being data-driven means that all decisions and processes are based on data. This is most evident in the field of big data (Pokorni, 2021a;Rouse, 2018). It is in connection with data science, data mining, etc. The term data-driven is used in many fields, in reliability as well.
Being based on data means using data, and using data means at least collecting and analysing data. This implies using some kind of communication. To achieve this, technology products (different devices, networks, software, Internet of Things, etc.) are used and anything of these can fail. Of course, it is advisable to avoid failures and resolve them if they happen, and this is the task of reliability.
Data-driven as a term describes a decision-making process which involves collecting data, extracting patterns and facts from these data, and utilizing these facts to draw inferences that influence decision making (Northeastern University, 2019).
Making decisions is a fundamental component of business and personal management. Good decisions lead to success while poor decisions lead to loss or failure. And this depends on data.
Every organisation today aims to be data driven. Data-driven decision making is the process of making organizational decisions based on actual data rather than on intuition or observations alone. This is the case in reliability as well (Northeastern University, 2019).

Impact of data on reliability and availability
As stated before, reliability has always been data driven (if being data driven means that all decisions and processes are based on data), while valid and relevant data have always been the main problem. It is important if some data are historical (from past experiences on failures) or gathered from new devices for which we calculate reliability. Of course, data gathered from new devices for which we calculate reliability are more valuable than data from past experiences, because data from the past come from different devices and older components, if data are used from handbooks, for example MIL-HDBK 217. Data about failure rates of new components are rarely available from producers.
In Military Handbook 217E (1986), it is stated that "Considerable effort is required to generate sufficient data on a part class to report a statistically valid reliability figure for that class. Casual data gathering on a part class occasionally accumulates data more slowly than the advance of technology in that class; consequently, a valid level of data is never attained." In Military Handbook 217F (1991), it is stated that "The first limitation is that the failure rate models are point estimates which are based on available data." Obviously, there are problems to gather sufficient and good data, in spite of the amount of effort taken. The problem is not only insufficient data, but also the accuracy of such data.
We will discuss the impact of data on reliability from several aspects: accuracy, availability, up-to-dateness of data, experience, culture in organisation, etc. Data are also the basis for reliability test developing. Data-driven reliability and availability of electronic equipment, pp.769-782

Accuracy of data
Reliability calculation (or better to say estimation) is always predictive (prognostic) i.e., it predicts what will happen in the future, for example what is a probability that a device will not fail after a certain time of operation. First let us see how an error or accuracy in input data in a simple reliability model (a reliability block diagram, RBD) can affect results for prognostic reliability calculations. In a so-called Parts Count reliability calculation which is implemented in MIL-HDBH-217, a serial configuration RBD model is used (Figure 1) (Pokorni, 2014a).  Figure 2 shows the reliability of the serial RBD model Rs as a function of the reliability of an element (all elements are with the same reliability R) and the number of these elements m. From Fig. 2, it can be seen that an error in the input data (if we consider the difference in R as an error in the input data) for the reliability of one element has a bigger impact on the reliability when a system has more components, which is usually the case. For example, if a serial RBD has 5 elements, and the reliability of each element is R=0.8, then an error of ±12.5 (it means that we used R=0.9 and R=0.7) will produce an error in the reliability of the system Rs of + 80% and -48.7%. It can be seen from Fig. 2 that errors depend on the reliability of elements R and the number of elements m, i.e., errors are smaller if the number of elements is smaller. Figure 3 shows a parallel RBD model of a system. Fig. 4 shows the reliability of a parallel RBD model as a function of the reliability of an element (all elements are with the same reliability) and the number of these elements n. From Fig. 4, it can be seen that an error in the input data for one element has a smaller impact on reliability when a system has more components, but in this case such a system is more costly. Adding the second element with the mean time to failure (MTTF) will increase the mean time to failure of the system (MTTFs) for 50%, adding the third element results in 33% increase, and adding the fourth element leads to 25% increase, as obvious from the equation (Pokorni, 2014) So, adding more elements in parallel to increase reliability will increase rather cost than reliability. Reliability and cost are mutually dependent. Higher reliability means higher cost, but cost for maintenance will be lower.

Availability of data
Availability of good data is a very big problem, especially today when technology changes very fast, and some components are very reliable (but we do not know to what extent) and we do not have timely and accurate data about their failure rates, i.e., reliability. As mentioned before, military handbooks (e.g. MIL-HDBK-217) can be used (but such data are obsolete), as well as commercial handbooks, data from producers (but these data are rarely available), or one's own data (which are not often available either).

Experience in the calculation of reliability
The experience of this author has shown that the calculated prognostic MTBF of electronic equipment, using MIL-HDBK-217, should be at least or about twice of the required MTBF in order to have operational (actual, correct) MTBF equal to the required (original) MTBF and that was applied as a rule when the Parts Count reliability calculation was made (Pokorni, 2014b).
The author is not the only one who had the problem of inadequate input data of electronic elements and inadequate estimation of these input data in the calculation of reliability (Pokorni, 2014b).
Another problem with data is that when there is a small number of produced devices, there are not not enough data.
Devices with very high required reliability present additional chalenges: in this case, real data are obtained only after a long time after these devices come into use.
One solution to the problem when there is not enough relevant data is seen in the so-called Physics of Failure, but it is applicable in the wear area of a failure rate. However, in the Physics of Failure, there is again a problem not only with relevant data but also with the knowledge of different processes in component materials (Pokorni, 2014a).
Besides input data in reliability calculation models, data from reliability analyses of systems or devices can offer more information. For example, in (Pandian et al, 2020), the Boeing 787 Dreamliner reliability was analysed using a data-driven approach. From various documents and trends, it was concluded that Boeing did not adopt an effective Reliability Program Plan where the best practice tasks are implemented to produce reliable products. Boeing opted to widen its supplier base and reduce costs by including manufacturers who were new to the aircraft development industry. The events that led to delays during manufacturing and failures during operation are a testament to Boeing's flawed practices.
Boeing's flawed practices that can serve as valuable lessons were as follows: Short development cycle and highly complex supply chains, Lack of accurate and timely information sharing, Lack of relevant data, Lack of valid testing on innovative technologies, Difficulty in fault detection, and Lack of balance between autonomy and oversight. Furthermore, these deficiencies are seldom independent of each other and can have a compounding effect on product reliability.

Quality of data
Data quality is only one problem in reliability. In (Elearth & Pecht, 2012), it is stated that there is no standard method for creating hardware reliability prediction, so predictions vary widely in terms of methological Data-driven reliability and availability of electronic equipment, pp. [769][770][771][772][773][774][775][776][777][778][779][780][781][782] rigor, data quality, extent of analysis and uncertainty while the documentation of the prediction process employed is often not presented. The IEEE thus created a standard, IEEE 1413 (Standard Framework for the Reliability Prediction of Hardware), in 2009.

Culture in an organization
Gathering good data is in connection with individual and organizational culture. The most important part of the development of a reliability program in an organization is to have a culture of reliability. It is extremely important that everyone involved in the creation of products, from the top on down, realize that a good reliability program is necessary for the success of their organization (Pokorni, 2014b(Pokorni, , 2016.
The reliability effort produces and uses a different amount of information and data.
One possible solution when there is not enough relevant data and an analytical reliability model cannot be derived is simulation (Pokorni & Janković, 2011). An example of not such a complex problem is illustrated in (Pokorni & Ramović, 2003).
Reliability is connected with maintainability. Reliability and maintainability are important factors in the total cost of equipment. An increase in maintainability can lead to reduction in operating and support costs. For example, a more maintainable product lowers maintenance time and operating costs. Furthermore, more efficient maintenance means a faster return to operation or service, decreasing downtime (Brunton et al, 2021). Again, good and timely data are necessary.

Reliability and availability of data
Reliability of data and their availability can be discussed as well. In order to build trust in data, it is critical that they are reliable, i.e., complete and accurate.
Data reliability means that data are complete and accurate, and it is a crucial foundation for building data trust across any organization. Ensuring data reliability is one of the main objectives of data integrity initiatives, which are also used to maintain data security, data quality, and regulatory compliance (Talend, 2023;Pokorni, 2021a).
Reliability leaders need reliable data to make reliable decisions. Therefore, in data-driven reliability, data reliability is of crucial importance. Data reliability is not the same as data validity. Reliability of data is based on data validity, completeness, and uniqueness (Pokorni, 2021a).
Nowadays, in the reliability domain, there is not only lack of good data, but there is lack of any data. People dealing with reliability calculations can see that clearly.
If the Internet of Things (IoT) is used tto gather data, because of IoT unreliability, data can be missing, incomplete and/or corrupted (Pokorni, 2021a).
Data for maintainability are usualy gathered from sensors or the IoTif these are not reliable, data can also be unreliable. Unreliable IoT can produce unreliable data as an input in a decision-making system, so decisions can be wrong.
Similar situation can happen with artificial intelligence incorporated in a decision-making system. Can artificial intelligence recognize bad data? Or will we believe in a decision made in such a way?

Reliability equation
Because a data-driven reliability system includes hardware, software, sometimes humans, and data, we suggest assessing the reliability of a data-driven reliability system by changing the equation from the (Pokorni, 2019(Pokorni, , 2020(Pokorni, , 2021a(Pokorni, , 2021b to the following one:

Conclusion
Reliability and availability continue to be very important. Reliability and availability have always been data driven while valid and relevant data have always been the main problem. Without good data, prognostic reliability is useless in spite of a good reliability model. This can be the case with maintainability as well.
In reliability calculations, there is usually a bigger problem with not enough relevant data than with large amounts of data. The problem is not only insufficient data, but also the (in)accuracy of data.
In reliability calculations, the data from MIL-HDBK-217 are usually used, and these are data from the past, obsolete in most cases. Up-todate data of failure rates of elements are rarely available. Regarding availability, it is a matter of accurately recording data about the operational time and the down time of a product being used.