CUTTING TESTING COSTS BY THE POOLING DESIGN

Introduction/purpose: The purpose of group testing algorithms is to provide a more rational resource usage. Therefore, it is expected to improve the efficiency of large–scale COVID-19 screening as well. Methods: Two variants of non–adaptive group testing approaches are presented: Hwang’s generalized binary–splitting algorithm and the matrix strategy. Results: The positive and negative sides of both approaches are discussed. Also, the estimations of the maximum number of tests are given. The matrix strategy is presented with a particular modification which reduces the corresponding estimation of the maximum number of tests and which does not affect the complexity of the procedure. This modification can be interesting from the applicability viewpoint. Conclusion: Taking into account the current situation, it makes sense to consider these methods in order to achieve some resource cuts in testing, thus making the epidemiological measures more efficient than they are now.


Introduction
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has posed a challenge to many countries when it comes to detecting infected people as a basis for implementing appropriate epidemiological measures. The high R 0 = 5.7 value of the SARS-CoV-2 virus (Sanche et al, 2020) affected the increased demand for tests (Pfeiffer et al, 2020), which was, according to some media reports in Serbia (Blic, 2020), also accompanied by the distribution problems. In other words, testing can quickly become a bottleneck in the event of a massive disease outbreak.
Modern epidemiological measures are proactive. In order to control the spread of infection, it is important to detect: • pre-symptomatic cases; • infected persons who have many high-risk social contacts (e.g., medical staff, geriatric center workers, teachers, etc.). The mass testing allows observing and tracking epidemiological patterns as the infection spreads. The situation mentioned above in which testing becomes a bottleneck certainly requires some optimization.
Among the SARS-CoV-2 virus infection tests, the most well-known methods are RT-(q)PCR and serological antibody tests. The RT-(q)PCR test is considered the most accurate (the most reliable). On the other hand, these tests are more expensive and complex when it comes to equipment, time, trained personnel, and procedural complexity.

Figure 1 -Total number of tested (green) and confirmed (red) cases per day, for
Republic of Srpska, until July 11th, 2020(Faculty of Electrical Engineering in Banja Luka, 2020 Рис. 1 -Общее количество протестированных (зеленый цвет) и подтвержденных (красный цвет) случаев на ежедневной основе в Республике Сербской, до 11 июля 2020 года (Faculty of Electrical Engineering in Banja Luka, 2020) Слика 1 -Преглед односа укупног броја тестираних (зелена боја) и потврђених (црвена боја) случаејва по данима, за Републику Српску, до 11. јула 2020. (Faculty of Electrical Engineering in Banja Luka, 2020 According to the reports of the official portal of the Government of Republic of Srpska (Government of Republic of Srpska, 2020), in the Republic of Srpska the percentage of confirmed cases (relative to the number of tests, daily) took an upward trend. In Figure 1, there is a time series bar plot depicting values that correspond to the number of confirmed cases (red) and the number of conducted tests (green), until July 11th, 2020. Given the test cost and the increased workload of laboratory technicians, it is reasonable to consider some strategies that will save both the testing kits and time.  Serbia, until July 25th, 2020(Government of Republic of Serbia, 2020 Рис. 2 -Общее количество протестированных (фиолетовый цвет) и подтвержденных (оранжевый цвет) случаев на ежедневной основе в Республике Сербия, до 25 июля 2020 года (Government of Republic of Serbia, 2020) Слика 2 -Преглед односа укупног броја тестираних (љубичаста боја) и потврђених случајeва (наранџаста боја) по данима, за Србију, до 25. јула 2020. (Government of Republic of Serbia, 2020 In Serbia, the percentage of confirmed cases is not as high as in the Republic of Srpska, but the growth of tested, infected, and deceased is quite noticeable from the data provided by (Government of Republic of Serbia, 2020). Figure 2 shows a graphical representation of values corresponding to the number of confirmed cases (orange) and the number of conducted cases (purple), until July 25th, 2020.
As previously mentioned, the rest of the world has not been spared from the testing kit shortage. Therefore, new protocols have been developed worldwide to reduce the scope of testing without affecting (at least not significantly) the quality and accuracy of the procedures (Eberhardt et al, 2020), (Mallapaty, 2020).
As a combinatorial procedure, group testing breaks down the identification task (identifying the defects in equipment, downtime in the use of communication channels, infected persons, etc.) into group tests instead of individual checks. Dorfman (Dorfman, 1943) was the first to describe this problem. Since then, different test schemes have been designed, including different types of tests as testing phases alternate (the so-called adaptive procedures). The idea itself has many applications in almost all areas of science: biology, computing, medicine, electrical engineering, data protection, etc. Interest in group testing was renewed after the launch of the famous Human Genome Project (Colbourn & Dinitz, 2007) Some prominent examples where group testing is used in medicine are an examination of donated blood from American Red Cross donors (Dodd et al, 2002), testing for chlamydia and gonorrhea (Gaydos, 2005), and monitoring the mosquitoes which transmit the West Nile virus (White et al, 2001). Moreover, when it comes to testing for SARS-CoV-2 virus infection, (Gollier & Gossner, 2020) have shown that even a very simplified approach, based on the bisection method, provides significant economic benefits. This paper presents two simpler variants of non-adaptive group testing based on swab-sample aggregation, also known as the pooling design. Firstly, Hwang's generalized binary-splitting algorithm is presented. Afterward, the so-called matrix strategy is described. In the end, concluding remarks are given, reviewing the current situation.
Hwang's generalized binary-splitting algorithm Here we will present Hwang's generalized binary-splitting algorithm. The pseudo-code is given in Algorithm 1. This approach can be used to cut resource costs in testing for SARS-CoV-2 infection (Ding-Zhu & Hwang, 1993). The input variables are a statistical prediction of the number of people that will be tested n and a statistical prediction of the number of confirmed cases z. The result should be a smaller number of tests needed to identify all infected individuals.
According to (Ding-Zhu & Hwang, 1993), the maximum number of tests M t when Algorithm 1 is used is where p < z is a unique nonnegative integer such that m = 2 s z + 2 s p + θ, Algorithm 1: Hwang's generalized binary-splitting algorithm Data: n, z Result: resource savings Example 1. Let there be a statistical forecast for the next day which says that the laboratory would need to take n = 100 swab-samples. Consider two cases: (z = 20) according to the Eq. (1) we have an estimation that 79 test will be needed at most, i.e. the laboratory will save at least 21 testing kits.
(z = 2) according to Eq. (1) we have an estimation that 14 test will be needed at most, i.e. the laboratory will save at least 86 testing kits, much more compared to the previous case.
The downside of Hwang's generalized binary-splitting algorithm is that it may require more swabs from one individual. If we have a situation like in the case of (z = 20), in Example 1, we can see from the the Algorithm 1 that it will be necessary to combine the swab-samples of four people at the beginning. Suppose, for simplicity, that only one is infected in the group. This means that the identification of the infected one will be based on the bisection method. In other words, our tests to identify an infected person have the following order: 1. test the group of 4 samples; 2. test two groups of 2 samples; 3. in the positive group test each sample separately.
As it can be noticed, it will be necessary to take three swabs from one person. The situation is getting complicated even more if the number of infected is very high, and z is much lower than n. As often, laboratories cannot send technicians several times to take swabs, and it is not convenient for potentially infected people to come to the laboratory. Hwang's general binary-splitting algorithm can create certain operational problems. Moreover, taking three swabs from each person is already pushing the bounds of practicality. For this reason, a matrix strategy has been devised, in which two swabs are taken from each individual.
Remark 1. In case the estimated percentage of confirmed cases exceeds 50%, it is not possible to make any resource cuts. Each swab-sample must be tested separately.
Remark 2. The presented Algorithm 1 does not take into account the accuracy of tests, i.e., the likelihood of obtaining false positive or false negative results is not considered.
Remark 3. The presented Algorithm 1 does not take into account whether there is a change in the accuracy of tests if a large number of samples are grouped.

Matrix strategy
Another way to reduce the total number of tests, which will be presented here, is based on the so-called matrix (tabular) strategy. We can, for the sake of simplicity, assume that the number of people n is the square of some natural number k, i.e., n = k 2 . Phatarfod & Sudbury (1994) were the first to propose this idea for high-throughput screening.
Basic matrix strategy (Algorithm 2). We take 2 (two) swab-samples from each person and form a square matrix of M samples. Each entry m i,j of the matrix M corresponds to one person, more precisely to a pair of swab-samples of that person. Rows and columns form groups of combined samples. With this, we have 2k groups for the first phase of testing, which is 1 2 k times less than the number in the case of individual checks. After the first phase (lines 1-2), we move on to the second phase (line 3).

Algorithm 2: Basic Matrix Strategy
Data: matrix of pairs of swab-samples Result: identification of confirmed cases 1 test column-groups and label the positive ones 2 test row-groups and label the positive ones 3 apply Algorithm 3 on a set of marked rows and columns The following two situations are possible for the second phase if we get at least one positive result: (1) if only one sample is positive, it is uniquely determined by the intersection of the positive column and the positive row; (2) otherwise, the swab-samples of the individuals from the intersection of the positive columns and rows are retested. The second case is the reason why it is necessary to take two swabsamples from each person.
The obvious problem with this approach, unlike with Hwang's generalized binary-splitting algorithm, is a possibility of a diagonal arrangement of samples of infected individuals (a technician is not aware of this because he/she does not know who is infected). Let's say that at the positions m i,i , for i ∈ {1 . . . , k}, everyone is infected. The total number of infected is k. Each column-group and row-group will be positive on the test of pooled samples. It follows from the previously described basic matrix strategy that we have to test all swab-samples on the intersections separately. In this case, it means all matrix entries. In other words, we need to test all swabsamples again, which in no way reduces the costs. Therefore, the maximum number of the basic matrix strategy tests for n = k 2 persons is k 2 + 2k. The basic matrix strategy can be improved by taking into account the experience of epidemiologists. For example, according to epidemiological analyses and field experience, swab-samples are classified into likely positive and likely negative ones. The use of additional information of this type was first studied by McMahan et al (2012). They considered two approaches: • spiral design; • gradient design. This paper will present only the first approach-spiral design, particularly, its simplified variant.
The first probably positive sample pairs are clustered along one of the matrix "corners" -say the upper left, (in the above notation: m 1,1 ). Put the first pair of probably positive swab-samples at m 1,1 . We put the other three in the positions m 2,1 , m 2,2 , m 1,2 . Then, the next five are placed at positions m 3,1 , m 3,2 , m 3,3 , m 2,3 , m 1,3 . The procedure is repeated until a set of probably positive samples is exhausted.
As it can be observed, this procedure leads to more expressed heterogeneity of the matrix. Then, moving from right to left, the first columngroups of the matrix will most likely show that they are negative, which gives us the right to write off their samples from further consideration. We stop at the first group-column that tests positive. Then we continue with the rows, from bottom to top, using the same idea. This can, roughly, eliminate a significant number of sample pairs. The described idea is represented by Algorithm 4.

Algorithm 4: Elimination of Probably Negative Cases
Data: the set of n = k 2 swab-sample pairs Result: elimination of probably negative swab-samples 1 P ← probably positive swab-sample pairs 2 N ← probably negative swab-sample pairs 3 cluster the swab-sample pairs from P around m 1,1 in spiral fashion 4 arrange the remaining swab-sample pairs (i.e., those from N ) so that a matrix is created Remark 4. In a gradient design, i.e, probably positive swab-sample pairs are clustered as columns, the elimination is performed only from right to left.
The diagonal elimination algorithm is applied to the rest of the matrix, obtained after the application of Algorithm 4. Before we describe the changes, we note that Algorithm 4 does not have to leave us a square matrix, i.e., the endpoint values of i and j do not have to be equal. Again, for simplicity, we consider a sub-matrix M positioned along the upper left corner of the format k ×k , where k = min{i, j}, for i and j as values returned by Algorithm 4.

Algorithm 5: Diagonal Elimination
Data: set of remaining swab-sample pairs; i, j obtained by Algorithm 4 Result: identification of positive cases 1 k ← min{i, j} 2 create a sub-matrix of samples M with dimension k × k 3 foreach i ∈ {1, . . . , k } do 8 end 9 end 10 for each positive test result in the previous testing, label the corresponding columns and rows as positive 11 apply Algorithm 3 to the labeled rows and columns In the first phase of eliminating diagonals, we test samples from one of the diagonals, say the main one. For each negative result at the position m i,i we test one sample from the corresponding positions on the counterdiagonal: m k −i,i and m i,k −i . For each positive result, label the corresponding column and row as positive. Then test swab-samples from the intersec-tions of all positively labeled rows and columns (Algorithm 3). An overview of this procedure is given in Algorithm 5.
Here, too, as in the previous algorithm, the variable t i,j (∀i, j ∈ {1, . . . , k }) has a binary domain -takes the value 0 if the test shows no infection (negative result), and 1 if the test shows SARS-KoV-2 virus infection (positive result).
Remark 5. However, although this procedure should remove swabsamples of infected persons from a diagonal, we can observe that, if the statistical prediction of the percentage of infected persons is low enough, random distribution of samples of probably infected persons would rarely get into an undesirable situation. However, even the partial diagonal arrangement is unfavorable.
From the rest of all untested swab-sample pairs (including those that did not enter M ), a new matrix M is formed. The diagonal elimination process can be repeated. How many times this will be done depends on the protocol of a particular laboratory and the costs cut estimate, given the daily scope of testing. This means that the elimination of diagonals can also be omitted.
Remark 6. After eliminating rows and columns in the first phase, the rest of the matrix does not necessarily have to be a square matrix, but we choose the largest quadratic sub-matrix.
Combining the previously described as a whole, we get Algorithm 6improved matrix strategy. However, the proposed improvement of the basic matrix strategy does not bring the maximum number of tests below k 2 , although such a large number is not expected in practice, especially if the percentage of infected concerning the number of tested is low. In other words, Hwang's generalized binary-splitting algorithm is theoretically a better approach than the matrix strategy. However, in practice, a matrix approach is used due to a smaller sampling volume.
Remark 7. The matrix strategy can be generalized to a multiphase tensor strategy, but this could lead to a rather impractical taking of many swabsamples from each person. Remark 8. The initial matrix does not have to be square, but the case is considered in the algorithm, for simplicity. Otherwise, the natural choice for the matrix format is that the sum of columns and rows is the smallest possible.
Remark 9. Additional improvements may be considered, but are likely to affect applicability in practice.
Remark 10. The presented Algorithm 6 does not take into account the accuracy of the tests, i.e. the likelihood of obtaining false-positive or falsenegative results.
Remark 11. The presented Algorithm 6 does not take into account whether there is a change in the accuracy of tests if a large number of samples are combined.

Concluding remarks
The algorithms above provide a way to cut the resource costs if the number of confirmed cases is low enough compared to the number of conducted tests. The lower the percentage of infected, the more resources can be saved. However, this does not mean that timely action will not be useful if the percentage is high. On the contrary, even in that situation, adequate actions to save the resources are of considerable benefit.
Theoretically, more significant rationalization of resources is expected, first of all, when using Hwang's generalized binary-splitting algorithm, but it is not very practical due to possible excessive taking of swabs. Its use depends on situation assessment. In a way, the combination of both approaches could give satisfactory results, i.e. lead to considerable savings. In that case, we would have an adaptive approach.
Considering the current situation and a noticeable increase in those infected with the SARS-CoV-2 virus, it makes sense to consider the presented type of resource rationalization to increase the effectiveness of epidemiological measures. Also, it is essential to notice that these approaches are not exclusively related to RT-(q)PCR testing. They can be used equally successfully in mass serological testings.