THE DISCRIMINANT ANALYSIS APPLIED TO THE DIFFERENTIATION OF SOIL TYPES

According to Kardaun, et al. (1993), the theory of discriminant analysis is a well developed branch of statistics and at the same time still a field of active research. Part of the algorithms are implemented in special or general statistical packages. One can approach discriminant analysis from a purely data-descriptive point of view and from a probabilistic point of view (Both approaches, but most easily the latter one, can be incorporated into a decision theoretical framework). In the latter approach, a probabilistic model is used to describe the situation. The applicability of such a model in non-random situations may be questioned


Introduction
According to Kardaun, et al. (1993), the theory of discriminant analysis is a well developed branch of statistics and at the same time still a field of active research. Part of the algorithms are implemented in special or general statistical packages. One can approach discriminant analysis from a purely data-descriptive point of view and from a probabilistic point of view (Both approaches, but most easily the latter one, can be incorporated into a decision theoretical framework). In the latter approach, a probabilistic model is used to describe the situation. The applicability of such a model in non-random situations may be questioned EP 2017 (64) 4 (1513-1521) Radovan Damnjanović, Snežana Krstić, Milena Knežević, Svetislav Stanković, Dejan Jeremić from a fundamental point of view (Breiman et al., 1984) However, such a probabilistic framework is almost indispensible if one wants to estimate the performance of procedures in future situations, and to express uncertainties in various estimates.
Moreover, it often leads to procedures that are also sensible from a data-descriptive point of view (Chercassky, Mučier, 2007). Or reversely: A specific procedure can often be viewed upon as a data-descriptive one, with little further interpretation, and as a probabilistic one, with considerably more interpretation, the validity of which is of course dependent on the adequacy of the framework.
Sometimes a procedure developed in one probabilistic framework can also be interpreted in another probabilistic framework, which may be more relevant for the data at hand (Farlov, 1984;Forsyth, 1989;Gilad-Bachrach, 2006;Gilad-Bachrach,2004). Thanh et al. (2017) show that there has been a great effort to transfer linear discriminant techniques that operate on vector data to high-order data, generally referred to as Multilinear Discriminant Analysis (MDA) techniques. Many existing works focus on maximizing the inter-class variances to intra-class variances defined on tensor data representations. However, there has not been any attempt to employ class-specific discrimination criteria for the tensor data. In this paper, they propose a multilinear subspace learning technique suitable for applications requiring class-specific tensor models. The method maximazes the discrimination of each individual class in the feature space while retains the spatial structure of the input.
Early on, Beauchamp et. al (1980) implemented discriminant anlaysis method to uranium exploration. It is possile to use discriminant analysis methods on hydrogeochemical data collected in the NURE Program to aid in fomulating geochemical models that can be used to identify the anomalous areas used in resource estimation. Discriminant analysis methods have been applied to data from the Plainview, Texas Quadrangle which has approximately 850 groundwater samples with more than 40 quantitative measurements per sample. Discriminant analysis topics involving estimation of misclassification probabilities, variable selection, and robust discrimination are applied (Hart, 1989;Haussler, 1989;Han & Camber, 2000;Kantardzic, 2011). A method using generalized distance measures is given which enables the assignment of samples to a background population or a mineralized population whose parameters were estimated from separate studies (Milojević et al., 2013;Vukoje, 2013;Stanojević et al., 2017). Zhijin, et al. (1994) used the discriminant analysis method in multivariate statistical theory to handle the e π μ separation in BES, describing the principle of the discriminant analysis method, deriving the unstandardized discriminant functions (responsible for particle separation), giving the discriminant efficiency for e π μ and comparing the results from the discriminant analysis method with those obtained in a conventional way.

Data and Variables
Our data collected 286 samples of soil of which 100 contained the organism Azotobacter and 186 did not. Characteristics of the soil were suded: X 1 = pH 6 X 2 = amount of redealy avaiable phosphate X 3 = total nitrogen content Data are colected from Iowa Agriculture Experimentation Statsion, Cox and Martin (1) In our case, a sample for X1, X2, and X3 was taken to 52 samples of the earth. Group A had 25 samples and contained Azotobacter, while Group B had 27 samples and did not contain Azotobacta.

Methods
In our case, we will use discriminatory analysis in order to evaluate the difference in soil diversity. In other words, through the knowledge of 3 characteristics X1, X2, and X3, through formal presentations in our case, the application of discriminatory analysis can make significant indications whether the soil sample contains or does not contain the organism Azotobacter . Respecting the fact that Aztobacter positively affects agriculture products, which is not a matter of our consideration. For the purposes of our research, we have identified the use of stepwise discriminant analysis for the purpose of determining a variable that is decisive for the classification procedure (Kohavi, 1995; Quinlan, & Cameron-Jonas, 1995; Koteri & Lester, 2012), whether the type of soil contains or does not contain the bacterium Azotobacter. The first step in our analysis is the application of linear discriminatory analysis.

Linear Discrimination Analysis-LDA (Supervised Learning)
The first step in the classification process is the application of LDA in the application of the Data Mining method -finding drowned knowledge (Written & Frank, 2005), which presupposes learning on the sample, produced the following results: We only hold on the confusion matrix, which indicates a resubstitution error of the order of 12%. A detailed analysis of the results shows that some variables are not important in the process of determining the presence of Azotobacter. The classification function would be, as follows: The question arises as to whether the variables X2 and X3 should be rejected from the analysis as insignificant for our classification process.
Although the error is optimistic, we approach the use of resampling methods called bootstrap I which gives a better assessment of the classification potential. We see that the actual error is significantly higher than the initial error.
Otherwise in the classification process is the application of stepwise discriminant analysis, with the results as follows:  Using forward strategy, we obtained for F statistics 3.84 that there is only one relevant variant X1 = pH.
The third step in the analysis is the re-implementation of the LDA, which in this case gives the same classification error, but with a discriminatory function, as follows: Again, using the bootstrap validation we came up with a similar error of 13%.
Application of the LDA and STEPDISK classification indicates that after the application of the Data Mining method, the so-called. The supervised learning came to a single chrysal variable that, in combination with a constant, has a dominant effect on determining whether the type of soil contains or does not contain Azotobacter.
The next section of the appendix has the purpose to define how many "potentials" influence the classification variables through the application of the Decision Trees.

Application of Decision Trees
Learning the decision tree is the process of creating a discriminating function in the form of a decision tree (1,8), (2,995-1003), (18,404). The tree is created recursively, from the top (roots) to the leaves, so each tree node represents a logical test of the value of an attribute from the description of the problem, and leaves represent the class in which the example is classified. When creating, the assortment of attributes for each node is done by heuristic methods, based on the assessment of the quality of discrimination (under) of a set of examples from the training session, remaining for discrimination in the observed node. Although a tree can perfectly classify all the cases from a training session, it does not represent a high accuracy guarantee on new examples, as they are often overfits according to training examples, so simplification is made, resulting in smaller trees, which are more accurate at the same time and more comprehensible. In our analysis, we used wellknown decision-making algorithms, C4.5 (16,287), which are available within the WEKA (University of Waikato) system (19) for the purpose of selecting associated attributes. The main advantage of the decision tree is to provide a significant way of presenting knowledge by extracting IF-THEN classification rules.

Attribute selection methods
The formation of an adequate model is based on the knowledge of the problem and is often reduced to the selection of the corresponding set of attributes. The existence of irrelevant and redundant (irrelevant, surplus) attributes in the problem model negatively influences the performance of most of the inductive learning methods, and such attributes are often removed from consideration by the method of previous or embedded selection of attributes (2). The optimal set of attributes contains all relevant attributes, while redundant and irrelevant attributes are usually excluded from consideration, although poorly relevant redundant attributes potentially contain information that can affect the improvement of classifying performance in practice (2), (4), (9). In the attachment, some methods of the previous selection of attributes embedded in the WEKA system (19) will be used to further check the significance of individual attributes from the problem model.
Results WEKA Selection of the Attribute. With different methods of searching and evaluating attribute subsets, the best subgroup is found, which gives the most accurate rules (trees). Some of the methods for individual attributes also give numerical estimates.
Method Relieff (evaluates each attribute separately), gave the following results: The Relieff method estimates that the most important attributes are hierarchically compared: X1 = pH.

Conclusion
Agroeconomics is facing increasing challenges, especially in the domain of research not only on the quality of land, but also on other food resources as sources of organic food. Methods of finding hidden knowledge have a presumption in relation to classical methods because they more precisely classify, and have higher predictive capacities.
The aim of this study is to examine the usefulness and exactness of these methods in the case of examining the presence of an asteroid in the soil or non-existence (category "yes" and "no") based on the sample examination. Supervised Linear Discimination Analysis was used to identify the specific effect of variables on the presence versus non-deposition of the Azotobacteria with methods of validating the accuracy of the classification of the effect of variables and identifying the key variables in this case, this is the presence of pH. In addition to this method, the Decision Tree was used, which gave results that are more precise in terms of determining the level to which the influence of individual variables is. The data obtained are accurate at the level of 90% and unlike conventional multivariate research, this is a survey where the influence of four variables on the presence of Azotobacteria from which three variables are not decisive for qualification is improved by means of the supervised analysis. Everything that the research put into the foreground was achieved and this is a great degree of research accuracy (level of 90%).
The RILIEF method -the selection of the attribute clearly defined the supremacy of the pH -factor effect, while the impact remained two relatively minor values of about 5% respectively. The use of this methodological tool would greatly help researchers in the field of agriculture, especially because of the possibility for research to be carried out on scarce training sessions with a large number of attributes (characteristics of the subject of research, eg land, quality of agricultural products, fruits, vegetables, eggs, meat I many others) and very few examples (the so-called scarce rallies). The problem of scarcity is related to the task difficulty assessment, which is dealt with in the Domain Data Mining domain by reducing the number of attributes (variables). Such methodological approaches enable the discovery of hidden knowledge in agronomy and agroeconomics, and primarily in the causes that determine the key -determined variables and attributes and factors for solving research problems and correct hypothesis, both in the field of agronomy and in other fields of research.