STUDYING THE SECONDARY STRUCTURE OF ACCESSION NUMBER USING CETD MATRIX

This paper, we have tried to analyze about the Secondary Structure of nucleotide sequences of rice. The data have been collected fro m NCBI (National Centre for Biotechnology Information) using Nucleotide as data base. All the programs were developed using R programming language using “sequinr” package. Here, we have used CETD matrix method to study the prediction. The conclusions are drawn accordingly.


Introduction:
Proteomics as the name suggest is a very vast field of research.Here we have tried to study the secondary structure of accession number for rice (Molina et al., 2011) using CETD matrix (Kuppuswami et al., 2015).We have considered 24 accession numbers for rice from the paper Cho et al., where the accession number were already been used for other studies (Cho et al., 2000).
The Accession Numbers for rice (Cho et al., 2000) are: D17586, M36469, X58877, Z11920, X07515, U12171, U33175, X64619, D78609, D78506, D30794, L10346, D63901, U40708, M29259, X65183, U08404, D14000, U31771, X53596, U37133, D16221, U49113, L37528 Primary structure of protein is defined as first level of protein i.e., the sequence of 20 amino acids is basically known as the primary structure of protein.The primary structure of protein has different characteristics which is dependent on hydrogen bonding and is known as secondary structure of protein.In other words, secondary structure is the second level of primary structure (Technical Brief 2009).
Here we only secondary structure of the nucleotide sequence of rice has been considered.Out of the secondary structures we have selected only six particles viz.Helix Residue, Beta Sheet Residue, Beta Turn Residue, Helix Region, Beta Sheet Region and Beta Turn Region.
The objective of the paper is: 1.To first split the accession numbers into some groups using lottery method.2. To find which is the most conservative accession number group using CETD matrix.

Method and materials:
We have used NCBI (Pruitt et al., 2007 andAltschul et al., 1990) nucleotide data base to find the sequences of rice.Later, using Chaufasman Algorithm and "sequinr" package in R programming language, the value of six parts of secondary structure of protein for 24 accession numbers have been found.
a ij are the elements of ATD matrix and e ij are the elements of RTD matrix (Porchelvi, S. R. and Vanitha, R.( 2011)).The RTD matrix consists of only -1, 0 and 1 elements.
Succeeding, each row of the RTD matrix has to be added.The maximum sum gives the group of accession number which is giving a better secondary prediction.Lastly, we combine all the RTD matrices and get a Combined Effective Time Dependent Data Matrix (CETD matrix) Porchelvi, S. R. and Vanitha, R. (2011), Narayanamoorthy et al., 2013 andKuppuswami et al., 2015).The row sum is obtained for CETD matrix and conclusions are made using the CETD matrix.Subsequently, all the matrices are represented using line diagram.
The numbering has been done so that we can group the accession number unbiasedly using Lottery Method; a method used in simple random sampling.The numbering of accession number has been done accordingly.
We define, X 1 : Helix Residue X 2 : Beta Sheet Residue X 3 : Beta Turn Residue X 4 : Helix Region X 5 : Beta Sheet Region X 6 : Beta Turn Region Here, CETD matrix has been used in order to check which group of accession number is giving a conservative region.The raw data of 24 accession number has been converted in the form of matrix, known as Initial Raw Data matrix. www.japmnt.com

Table 1: Initial Raw Data Matrix Table 2: The ATD Matrix Table 3: The Mean and Standard Deviation of the above ATD Matrix
It has already been mentioned that the parameter α for the construction in RTD matrix lies in the range [0,1], so three arbitrary values of α has been selected viz.0.1, 0.15 and 0.3.

Table 4: Values of (µ jα σ j ) and (µ j + α σ j ) for different values of α
Let us create the RTD matrices for various values of α.
The RTD matrix for α = 0.1 The row sum matrix The RTD matrix for α = 0.15 The row sum matrix Adding the elements of the three RTD matrices, we get the CETD matrix, which is given below: The CETD matrix The row sum of CETD matrix From figure 4, that is the line diagram above, it can be seen that the highest peak is in the second group i.e., in the group 5-8.Which implies the region from 5 to 8 is conservative.The second objective fulfills here.

Conclusion:
From the CETD matrix, it is seen that the row sum of CETD matrix is highest in the interval 5-8 which means that the group of accession numbers viz.X58877, Z11920, D14000 and L37528 are giving a better secondary prediction than the other intervals of accession numbers, which means that the region is conservative and the sequence of the accession numbers are identical.Also it has been seen that the lowest negative is in the interval 17-20 which means that the group of accession numbers viz.U40708, M29259, X65183 and U08404 are giving the least secondary prediction among the other intervals.It concludes that the sequences in these accession numbers are least alike.