Chemometric window to antimicrobial activity of biomolecules isolated from endophytic fungi

Drug resistance, especially bacterial antibiotic resistance, is recognized as a global phenomenon. Potential of endophytic fungi as producers of secondary metabolites with wide spectra of different bioactivities in the field of drug discovery has recently been introduced. The production of these compounds is under the great impact of the variety of factors related to the choice of plant host, climate conditions, nutrition, presence of other microorganisms in the same surrounding, etc. Due to a large number of different indices of endophytic fungi antibacterial activity, critical evaluation with the assistance of data mining analytical methods was performed. The activity towards several pathogen bacteria of endophytes species growing on different plant hosts, deciduous trees as well as herbaceous plants spread worldwide in different climatic zones together with the taxonomy of endophytes was taken into consideration. The principal component analysis was used to explore formed big data set and through finding patterns in data, to point out to a limited number of proper candidates for future pharmaceutical research. The antimicrobial character of Phomopsis species was highlighted and future perspectives in their therapeutic use were projected.


Introduction
Drug resistance, especially bacterial antibiotic resistance, is a global phenomenon. Bacteria are emerging with new mechanisms of resistance that are spreading around the world, compromising our ability to treat common infectious diseases. This can result in prolonged illness, disability, or even death. Many medical procedures such as organ transplantation, cancer chemotherapy, diabetes treatment, and major surgery are also at high risk without effective antimicrobials for the prevention and treatment of infections Although modern medicine was built on reliance on antibiotics, we are now heading towards the world without them (1-3). The need for new antimicrobial agents is high, so this research is focused on finding new potential antimicrobial agents.
The pharmaceutical industry is in constant search of new substances. Among different available research strategies, there is evidence that natural selection is equally effective in the discovery of new compounds as combinatorial chemistry (1). In that respect, new naturally occurring, versatile, and sustainable resources of drug discovery could be found in the exploitation of endophytic fungi. Endophytic fungi are fungal microorganisms that spend all or just a part of their lifecycle inter-and/or intra-cellularly colonizing healthy tissues of their host plants. Typically, they do not cause any apparent disease symptoms to the hosts. On the contrary, endophytes and their hosts establish a special relationship preserving hosts from threats in return for their nutrition. To facilitate the survival of this symbiosis, endophytes often help host plants to overcome the invasion of pathogenic microorganisms and/or insects by producing secondary metabolites (1- 14). So far, it was discovered that these metabolites belong to diverse structural classes including alkaloids, peptides, steroids, terpenoids, phenols, quinones, and flavonoids (3). Although they were indeed investigated for many years, endophytic fungi are still not known enough in their essence. The field of drug discovery with endophytes recognized as "biofactories" of novel bioactive substances is facing with difficulties mainly related to the large number of data recorded for a variety of endophytic species, diverse combinations of endophytic species and herbaceous plant or tree hosts including differences in plant parts, age, and habitats as well as eventual cohabitants with other strains of microorganisms. At the same time, it seems that climatic conditions and sessional fluctuations also play an important role in the production of endophyte‫׳‬s bioactive molecules. Furthermore, there are pieces of evidence in the literature related to the use of different microbial or cell cultures used for detecting bioactivity as well as different and usually non-standardized protocols applied in bio-profiling studies (1-17). Therefore, it is extremely important to make critical evaluations enabling further extraction of a limited number of proper candidates or promising leads for drug discovery.
In order to properly manage and interpret a large number of chemically derived data, the use of Chemometry, which combines different mathematical and statistical methods, is considered as necessary. The development of computational technology and advanced algorithms to face the socalled data explosion in analytical chemistry has resulted in an increasing interest in data mining. Data mining refers to a technique used to explore large amounts of data in search of consistent patterns and/or relationships between variables and to express them in a form susceptible to making predictions or estimations of the expected outcome. Data mining comprises of several confirmatory and exploratory data analysis techniques which are presented in detail in the literature (18). Principal component analysis (PCA) is a multivariate modelling and analysis technique commonly used as a way for identifying patterns in data. PCA can reconstruct and at the same time compress the original data set by means of finding a lowerdimensional linear projection of high dimensional original data. This is achieved through the generation of a limited number of new orthogonal axes or variables known as principal components (PCs). These PCs are uncorrelated with one another and express much of the total variability in the original data set. In that way, the reduction of the number of dimensions in PCA is achieved without much loss of information (18)(19)(20)(21).
On the bases of previously mentioned, the general intention of this work is to provide a better understanding of the antimicrobial activity potential of biomolecules isolated from different endophytic fungi with the assistance of selected chemometric tools. Among various chemometric methods, the literature survey pointed out that PCA has already been successfully applied in bioactivity profiling of herbs (19).

Experimental
The data set for PCA was based on experimentally obtained indices of biological activity established for different endophytic fungi growing on conifer needles collected from Pinaceae trees found in unpolluted regions of Slovenia. Due to slow growth in their natural source, the endophytic fungi were isolated and put into a nutrition medium to promote their growth (fungi maintained in potato dextrose agar at room temperature for one month). From this material, secondary metabolites were extracted with dichloromethane and/or ethanol, and prepared extracts were further used for assessment of antibacterial activity. The sample processing procedure and microdilution assay were based on previous achievements of Slovenian partners in the research project gathered while focusing on the genetic properties of selected endophytic fungi (1, 15). This knowledge was successfully applied in further joint investigations and chemical structure characterizations of secondary metabolites isolated from endophytes growing on Pinaceae trees (22). Predominant distribution of endophytes with antimicrobial potential was noticed among plant families Fabaceae, Lamiaceae, Asteraceae, and Araceae but some researchers indicated that this was not an exclusive characteristic (2,13). Therefore, the primary experimentally generated database comprising of endophytes growing on Pinaceae trees was further complemented with the literature records on bioassays for the same endophytic species which were growing on different plant hosts, deciduous trees as well as herbaceous plants spread worldwide in different climatic zones (1-17). The total number of 18 families of plant hosts and 59 endophytic species was taken under consideration. Endophytic species were arranged as observations while their activities towards bacteria which were usually multidrug-resistant pathogens capable of escaping action of antimicrobial agents, Staphylococcus aureus, Pseudomonas aeruginosa as well as Escherichia coli, were denoted as quantitative variables in the data set. Due to diversity in approaches used to evaluate antibacterial potential, the initial data was accordingly scaled and all bioactivity metrics were put on the same dimensionless scale (23) ranging from 0-100. The equation (1) was used for calculation: Where ( ) stands for a new scaled value of measured bioactivity x obtained within used assay method, xmax, and xmin stand for the maximum value obtained for reference antibiotic drug and the value obtained for the blanc used within the same assay method.
Regardless of the above-mentioned diversity in bioactivity metrics, they all together pointed out the equal qualification of antibacterial potential expressed as low, moderate, or high bioactivity. Additionally, the taxonomic widespread of endophytes within kingdom Fungi, phylum Ascomycota, was also taken under consideration as a qualitative variable (endophytic species belonging to 23 families) (1-17, 22). The selection of data for PCA was based on the availability of reliable experimental and/or literature records and is presented in Table I

Results and discussion
During PCA analysis, the correlations between investigated variables were determined by the value of the Pearson correlation coefficient, which describes the strength of the linear relationship between two quantitative variables with a significance level α = 0.95 and probability number p < 0.05. The correlation matrix is presented in Table II. Each PC is defined by a vector known as the eigenvector of the variance-covariance matrix while the variance along the vector is known as the eigenvalue (Table  III and Figure 1).
The first principal component (PC1) had the highest eigenvalue of 1.305 and accounted for the maximum amount of the variability in the data set, 43.510%. The second and third PCs (PC2 and PC3) had eigenvalues of 1.006 and 0.689 and accounted for 33.520% and 22.970% of the variance in the data, respectively. Only eigenvalues with values ≥ 1.0 were considered as significant descriptors of data variance according to Kaiser's rule (18)(19)(20). The third generated PC had smaller eigenvalues and therefore did not explain significant variability in the data. For that reason, only the first two PCs with cumulative variability of 77.030% were used for further study. This is a good result, but anyway, it is still advisable to be careful with the interpretation of the maps as some information might be hidden in the remaining PC.

Figure 1. Major principal components with corresponding eigenvalues
The loadings corresponding to the PCs were calculated from the correlation matrix. Loading values represented significant contributions of variables to the total variability explained by the generated PCs (Table IV). Specific patterns of correlation between the variables tested can also be visualized using the analysis of loading plots constructed for PCs explaining significant variability in the data set, PC1, and PC2 (Figures 2 and 3). The loading plot also enables the identification of important variables. The objective of a loading projection is to visualize the position of the variables concerning one another in two-dimensional space and their corresponding correlations. Variables closest to one another and far from the plot origin are positively correlated (or directly proportional), while variables opposite to one another on the plot are negatively correlated (or inversely proportional). The length of an arrow belonging to a variable also reflects the strength of its relation with PCs. The loading plot can also explain the relationships between two variables by their angle from the centre. The correlation coefficient between two variables is defined as the cosine of the angle between their respective vectors on the plot. The cosine of 180° is 1 and therefore indicates the negative correlation. Based on this mathematical rule, uncorrelated variables are orthogonal and they occur at right angles to one another because the cosine of the angle between them is cosine 90° equals 0. Similarly, the cosine of 0° is 1, which denotes a positive correlation between the variables. To confirm that a variable is well linked with an axis, squared cosines were analyzed. If the squared cosines of a given variable are close to zero, the risk of a false interpretation of the results in terms of trends on the corresponding axis is greater (Table IV).    The loadings of variances in PC1, which explained the most variability in the data set, indicated that it had high contributions from the scaled activity against E. coli (0.81) and P. aeruginosa (0.80). Whereas scaled activity against E. coli exhibited negative loading, scaled activity against P. aeruginosa had positive loading according to the sign significant relationship of their contributions to the data variability. PC2 showed a high positive loading for scaled activity against S. aureus (0.987) while contribution from other variables was less pronounced. The scaled activity against S. aureus is in no correlation with other variables while scaled activities against E. coli and P. aeruginosa are in strong negative correlation. This outcome could find its justification in the fact that E. coli and P. aeruginosa are Gram-negative bacteria, while S. aureus is Gram-positive bacteria.
While the loading plot enabled the identification of important variables and their relationships, the score plot indicated whether observations were similar or dissimilar, typical, or represented an outlier. The score plot of the PC1-PC2 comparison revealed three distinct groups of endophytes ( Figure 2). It can be concluded that representatives of endophytic fungi belonging to families 7, 10, 15, and 17 (Helotiaceae, Meruliaceae, Ceratocystidaceae, and Coniochaetaceae, respectively) show similar bioactivity profiles which were dissimilar from families 2 and 3 (Diatripaciae and Botryosphaeriaceae, respectively). Endophytic fungi belonging to family 8 (Valsaceae, different species of Phomopsis genus) could be found in each group in a score plot and are in a strong correlation with all variables. This fact led to the suggestion that the observations reflected the small influence of plant hosts with which these endophytic fungi form symbiotic communities (investigated host plants from following families: Pinaceae, Rhizophoraceae, Salicaceae, Apocynaceae, Anacynaceae, Anacardiaceae, Fabaceae, Asteraceae, Cistaceae, Caryophilaceae, and Moraceae). This conclusion is in good accordance with the experimentally generated data using endophytes growing on needles of Pinaceae trees and indices of bioactivity found in literature records (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)22). Several outliers can also be seen in a score plot due to their high relation with scaled activity towards P. aeruginosa. This could be explained by the total amount of variability explained by PC1 and PC2 (77.03%) as well as a relatively similar amount of variability explained by PC2 and PC3 (33.52% and 22.92%, respectively) meaning that some interpretations could remain hidden in PC3.
It should be mentioned that there are limited data for some representatives of the largest group in the score plot (different types of Phomopsis and Phoma species) pointing out the activity against Bacillus subtilis, Enterococcus faecalis, Klebsiella pneumoniae, and Mycobacterium tuberculosis (1, 8,[16][17][24][25][26][27][28][29][30]. Furthermore, there are representatives of the same group which demonstrated another type of bioactivity, antifungal activity against Candida albicans (29,30). Therefore, in line with the future number expansion of bacterial species with which the antimicrobial activity could be investigated, it would be a good point to search for new antifungals among biomolecules produced from these species as well. This is an equally important issue as already stressed bacterial resistance since it has been noticed that nowadays the world is also facing fungal resistance to known therapy. Isolation and closer inspection of the chemical structures of secondary metabolites originating from these endophytic fungi would surely reveal a valuable leading compound in future drug discovery.
According to their position in appropriate plots, PCA analysis highlighted Phomopsis species as the most potent resource for further pharmaceutical research. The literature survey demonstrated that the genus Phomopsis has already been recognized as a rich source of secondary metabolites of diverse structures such as phomopscichalasin, cytochalasin, convolvulanic acid, and isobenzofuranones, oblongolide, phomopsolide, phomodiol, phomoxanthones and xanthones dimer, phomoenamide, phomonitroester, deacetylphomoxanthone B, dicerandrol A, (1S,2S,4S)-p-menthane-1,2,4-triol, uridine, ethyl 2,4-dihydroxy-5,6-dimethylbenzoate, phomopscilactone and a range of polyketides. For few among listed compounds, antibacterial, antialgal, and antifungal activity has already been proven (1, 15, 28-34). Owing to the wide natural distribution of Phomopsis species, already recognized indices of its bioactivity and the assistance of findings from data mining reports, more productive harvesting of crude fungal material, and preparation of extracts richer with secondary metabolites should be achieved. Although bacteria constantly change mechanisms of actions and their characteristics, these endophytic fungi with the unique structure of their biomolecules could hopefully become a worthy opponent in a battle with drug-resistant strains.

Conclusion
Natural products have the enormous pharmaceutical potential for drug discovery. For years, this indicated the investigation of therapeutic potential of plants. Nowadays, the new equally precious natural source was recognized among endophytic fungi. The production of bioactive secondary metabolites of endophytic fungi is under the impact of a wide variety of factors related to the choice of plant host, climate conditions, nutrition, presence of other microorganisms in the same surrounding, etc. This paper offers a focused presentation of the relevance of this variability as well as the resulting similarities/dissimilarities in selected endophytic fungi antimicrobial activity profile. Although PCA was exclusively used for pattern recognition in prepared data set, a closer analysis of PCA outcomes (loading and score plots) revealed three major groups among which participants follow specific similar trends in bioactivity. This valuable remark could be of assistance in all further investigations of endophytic fungi since it provides reliable prior knowledge on the antimicrobial potential. The very promising antibacterial character of Phomopsis species was highlighted according to their widespread position in obtained score plots and future perspectives in their therapeutic use as antifungal agents were projected as well. It was concluded that the production of their biomolecules is rich enough and relatively stable under occasions investigated in this paper and therefore has the necessary prerequisite for further pharmaceutical research.