COMMUNICATION NETWORK RELIABILITY AND AVAILABILITY ESTIMATION BY THE SIMULATION METHOD

Summary: The paper gives an estimation of two-terminal availability for a complex communication network (up to 50 nodes and 150 links) by a software package developed by the authors and based on the Monte Carlo simulation method and Weibull distribution. Reliability can be estimated as well. The software package is written in VISUAL BASIC 6.0 programing language on an MSI laptop with the Windows XP Professional operating system. The program execution time with over 500000 iterations is a few seconds (without graphical display). The Weibull distribution function, used for availability estimation, can be applied for electronic, electromechanical and mechanical components as well as with different failure rates.

Introduction ommunication networks are an essential part of current information and communication technology in civilian and military applications. In most applications, it is of great importance for connections between the communicating parties to remain without failure. Therefore, the reliability and availability of these networks are of great importance. C FIELD: Telecommunications slavko.pokorni@its.edu.rs We have modeled a real network composed of communication centres (nodes) and links between them. We assume that a network fails if a communication center fails and/or a communication link fails in such a way that there is no communication between two nodes. This paper gives an overview of the results of the application of the developed software package for reliability and two-terminal availability (availability between any two nodes) calculation [1] of a complex communication network, based on the Monte Carlo simulation method.
Papers [2] and [3] discuss the verification of the accuracy of the results of the software simulation package for the availability calculation and for obtaining a compulsory level of availability, using relatively simple examples of communication networks with elements connected in a ring, for which determining the analytical formulation is possible. The influence of the changing duration of the active node repair has also been presented for specific examples. The results obtained by the software package and the analytical method for the same examples showed a high level of agreement. The results of the former research have thus been confirmed and recommended for further research on development of a mathematical-physical model and an adequate software package that supports the applied model for determining the availability of a complex communication network.
The results of applying this developed software package in calculating the availability of a complex communication network [4,5], based on failure data and repair rates of hardware elements (nodes and links) of the network are presented in this paper.
Today communication networks involve software, but software failures and external interferences have not been included in the analyses.

Why the simulation method
There are a number of methods for reliability and calculation, but few of them can be applied if the network is complex, especially if network components are repairable, which is usually the case. They are based on counting the states in which a system is operational, and adding subsequently the probabilities for a given system to be in these states. The number of operational states can be high. For example, in the case of a 10-component network (nodes and links) where every component can be in two states (operational one and failure), the network can be in 1024 states. If each component can be in three states (operational and failure), the network can be in 59049 states. Considering the great number of nodes and links which create an extremely high number of system states, it is difficult, almost impossible, to determine the analytical solution for the reliability and availability of such complex communication networks. To determine availability of repairable systems, Markov models are usually used, which requires solving differential equations [6,7]. This is the reason why the method of simulation can be a solution for this problem.
In practice, reliability of system parts (nodes, and links in the case of a communication system) can only be estimated (through experiments or historical statistics) and hence not exact. The so-called exact network reliability calculated by exact methods is, therefore, estimates at its best. This is the reason why the word estimation is used, and, therefore, the simulation methods which in reality result in estimations are justified and desirable.
One general simulation method which can be applied in different areas is the Monte Carlo simulation method, chosen here to solve the problem of estimation of reliability and availability of complex communication networks.
The software package for reliability and availability estimation is developed by the authors from the Technical and Test Center. Before the application of the package to complex communication networks, it was checked on more simple networks and it proved to give the results very close to the results obtained by exact equations [2,3].

Definitions of reliability and availabiliy
Reliability is expressed as a probability of success over a given duration of time, cycles, etc. For example, the reliability of a communication network might be stated as 95% probability of no failure over a 1000-hour operating period while giving a certain level of service. Reliability is concerned with the probability and frequency of failures. A commonly used measure of reliability for repairable systems is the mean time between failures (MTBF). The equivalent measure for non-repairable items is mean time to failure (MTTF). Usualy, it is assumed that for elements of the system exponential distribution of failures can be applied, but it is applicable only for some electronic systems, and not for all electronic systems.
Availability is defined as the percentage of time when a system is available to perform its required function(s). It is measured in a variety of ways, but it is principally a function of downtime (time when the system is not operational). Availability can be used to describe a component or a system but it is most useful when describing the nature of a system of components working together. Availability is most often written as a decimal, as in 0.99999, as a percentage, as in 99.999%, or equivalently spoken, "five nines", which is very often stated availability for communication networks.

The Example of a Communication Network
A communication network, Fig. 1, that consists of eight nodes has been taken as an example. There are parallel connections between nodes 1-2 and 7-8 (redundant links), but the most of the nodes are connected by one link. The function of availability A(t) [4] is defined as probability of the system being available to perform the given function at the time t, which implies the possibility of a quick repair or, in other words, a strong logistic support (available tools, equipment, spare parts and trained staff).
The availability that refers to repairable systems, for a network element, is given by:    In the example, Fig. 1, the assumed network consists of electronic elements for which exponential distribution is adequate. The mean time between failures of the elements has been used as input data because equipment manufacturers are obliged to provide them on demand. These data can be obtained in other ways; such as using standardized reliability handbooks, e.g. the military handbook MIL -HDBK -217 F.
The mean time between failures (MTBF) of the nodes for the network in Fig. 1 are presented in Table 1, and the mean time between failures of the links is presented in Table 2.
A software package for the application of the Monte Carlo method, and the basics of its performance are presented in [2] and [3].
The failure rate for the electronic elements is assumed to be constant with time. This implies an exponential distribution which is a special case of Weibull distributions (when the form parameter equals 1). The mechanical and electromechanical elements have increasing functions of intensity of failures during exploitation. Their times between failures have the Weibull distribution with the parameter form >1. In the example in Fig. 1, since it represents a network made of electronic elements, the form parameter equals 1.
The rest of the input data are: 1. the mean time to repair (MTTR) of the nodes is 5 hours, 2. the mean time to repair (MTTR) for the links is 3 hours, 3. the reviewed time of the working network in which the wanted availability is T 0 = 100 hours, 4. the number of iterations N =103.
For determining the two-terminal availability between the nodes 1-8, Fig. 1, the starting node 1 and the ending node 8 are defined by software. It is possible to define, show graphically, and calculate all the possible combinations of the two-terminal availability between any two nodes of the communication network in the same way.  The Results of the Availability Calculation After performing, in our case, N=103 tests of application of the Monte Carlo simulation method by the software package, for the network in Fig. 1, the results of the estimated spot and interval availability of the communication network have been reached and shown in Fig. 2.
For the assumed input data, in this example, the two-terminal availability between the nodes 1-8 of the communication network in Fig. 1, is: , 9708 , 0 Considering the adopted assumptions, the value of availability, which is obtained in the process of simulation, represents the probability that the configuration of the hardware, representing the paths between nodes 1 and 8, will be available at any given time. Considering the fact that the availability is valid for specific working and environmental conditions, it is here implied that the valid conditions are the ones for which the input data are given.

Conclusion
The developed software package provides a simple way for the evaluation (estimation) of reliability and the two-terminal availability between any two nodes of a complex communication network, for which it is difficult, almost impossible, to determine the analytical relation of reliability and availability. Since the simulation model in the software package is based on the Weibull distribution, this simulation method is widely applicable, because it enables the use of this software for calculation (estimation) of the availability of a complex communication network made of electronic components (as communication nodes), and electromechanical or mechanical components (as communication links). By this software, it is possible to define, show graphically, and estimate reliability, to alocate reliability, and to estimate all the possible combinations of the two-terminal availability between any two nodes of a communication network (up to 50 nodes and 150 links).