BIOMETRIC SYSTEMS BASED ON ECG USING BIOMETRIC SYSTEMS BASED ON ECG USING ENSEMBLE EMPIRICAL MODE DECOMPOSITION AND ENSEMBLE EMPIRICAL MODE DECOMPOSITION AND VARIATIONAL MODE DECOMPOSITION VARIATIONAL MODE DECOMPOSITION

Electrocardiogram (ECG) based biometric is challenging to be developed with the aim of high-security access. This biometric system is more dif ﬁ cult to falsify, compared to the conventional biometric systems. From previous proposed studies, there is still a gap to improve the accuracy of the system. Therefore in this study, a new protocol is proposed to improve the performance of the ECG biometric system compared to previously reported studies. This study decomposes the ECG signals using a method based on empirical mode decomposition (EMD) based, which are Variational Mode Decomposition (VMD) and Ensemble Empirical Mode Decomposition (EEMD). These two methods are the development of the EMD method to overcome one main problem of EMD. That is, the EMD method generates oscillations with the same time scales, which stored in different decomposition levels. A private ECG dataset, recorded using one lead ECG signal from 11 subjects, is used in this study. ECG signals from each person are then segmented into ten windows to become training data and test data. VMD and EEMD methods are used to decompose ECG signals into ﬁ ve sub-signals. Feature extraction based on statistical calculations is applied at each level of decomposition to obtain the characteristics of the ECG signal. Mean, variance, skewness, kurtosis, and entropy are evaluated as predictors. Support vector machines and 10-fold cross-validation are used to validate the performance of the proposed method. Our simulations demonstrate that the proposed method outperforms several previous studies and achieves an accuracy of up to 98.2%


INTRODUCTION
Biometrics has been widely used for identifi cation, authentication, or security system applications. The biometric concept consists of psychological and behavioral modality. Physical biometric systems such as fi ngerprints and iris-based are the most commonly applied. However, both of them tend to be easily falsifi ed so that the potential for misuse [1], for example, for criminal acts. Therefore, many studies have begun to lead to the search for new biometric approaches. One biometric modality that has recently received attention is the electrocardiogram (ECG) based on biometrics. This is based on the hypothesis that individuals have unique ECG waveforms [2]. Biometric research with ECG modalities has been carried out by previous studies in both simulation and implementation that were tested offl ine or in real-time. The study by Biel et al. [3] conducted a biometric simulation based on ECG using a 12-lead ECG standard. From this study, it is possible to extract features from one lead only. Similar research by Vessela [4], simulates 12 leads ECG-based biometric. Another study by Jekova [5], simulates personal identifi cation using the ECG and even analyzes the infl uence of the personal health status on accuracy. However, the proposed studies use a 12 lead ECG system, so it tends to be complex. Many previous studies propose various methods of feature extraction, including analysis in the time domain, frequency, time-frequency, and wavelet. Wavelet-based analysis methods on ECG biometric simulations have been proposed in research [6][7][8][9][10][11]. Another proposed method for feature extraction on ECG biometrics applications is the frequency-based approach. Analysis of the frequency domain based on the Fourier transform has been conducted in research [12][13]. Other researchers propose the time-frequency domain method for feature extraction, as reported in [14][15]. Another method based on template analysis has been proposed in research [16], in its simulations resulting true acceptance rate (TAR) <90%. Recently a deep neural network approach was proposed in ECG biometric systems as reported in the study [17]. However, this requires large memory resources to process the data that it might be diffi cult if implemented on low-cost computers. Each of the proposed studies achieves high accuracy. This can strengthen the hypothesis that ECG has the opportunity to become a biometric modality in the future. Although studies show good performance, it cannot be compared to one another, because there are differences in the number of ECG leads, ECG datasets, and devices used. At least the results of this research have strengthened one study to another. Furthermore, the proposed methods can be used as a reference in the development of ECG-based in the near future. In previous studies, sample entropy (SampEn) and Hjorth descriptors were proposed for the feature extraction  [18]. The accuracy result was 93.8%. Another study measured the entropy on wavelet decomposition signals. The accuracy result was 71.8% [19]. The empirical mode decomposition (EMD) method with statistical measurements has also been proposed in previous studies [20]. The accuracy result was 93.6%, not outperform compared with study [18]. From the previous proposed studies, there is still a gap and also opportunities to improve accuracy. Since the EMD method has a drawback which generates data at the same time scale but is stored at a different level of decomposition. This study proposes an ECG biometric method using Ensemble Empirical Mode Decomposition (EEMD) and Variational Mode Decomposition (VMD) combined with statistical analysis. This study aims to improve the performance of our previous studies in [18][19][20]. The ECG signals are decomposed into fi ve levels using EEMD and VMD. Statistical analysis is used to get the characteristics of each decomposed signal. The proposed method in this research is tested and evaluated on a private dataset with a higher number of subjects, consisting of 11 subjects. Each subject has 10 ECG signals. Thus 110 ECG signals are simulated to become the training and test data. The validation and performance tests are carried out using Support Vector Machine (SVM) and 10-fold cross-validation. Some SVM kernels have been used with the aim of fi nding the best performance. These simulations show that the proposed method outperforms several previous studies and achieves 98.2% of maximum accuracy using linear and quadratic SVM. The remaining section of the paper is organized as follows: Section 2 explains the data collection and proposed method, including feature extraction and classifi cation. Section 3 describes the results and analysis of the study, followed by a discussion. Meanwhile, the conclusions of this study presented in section 4.

Data collection
ECG signals are recorded using a one-lead digital ECG device that we have developed [21]. This study used the same dataset with our previous research in [19], [20], which consists of 11 subjects. ECG signal leads refer to the Einthoven triangle tapping technique. This digital ECG has a resolution of 10 bits with a sampling frequency of 100 Hz. A sampling frequency of 100 Hz is a recommendation in the study of [22] for heart rate analysis. At the time of recording, the subject is relaxed and sat on a chair. The total recording duration of each person is one minute The ECG signal data stream is then stored in .txt format to be analyzed and simulated. Figure 1 shows an example of a raw ECG signal from one of the subjects. This section describes the proposed method for the ECG biometric system. Figure 2 shows the general process of the proposed method. In the pre-processing stage, the raw ECG signal is fi ltered to reject a large amount of noise. EEMD and VMD then decompose Noise-free signals into fi ve levels. The next process is feature extraction by calculating statistical parameters for each decomposed signal. The performance of the proposed method is tested using SVM and cross-validation. The success parameter of this research is the value of the accuracy of the system in person identifying.

Signal pre-processing
Pre-processing is intended to reject low and high-frequency noise in the form of baseline wandering and muscle noise. In this process, signal normalization also carried out to avoid large deviations that may reduce system performance. Pre-processing is carried out by a high pass fi lter (HPF) with a cut-off frequency of 0.5 Hz and a low pass fi lter (LPF) with a cut-off frequency of 50 Hz [23]. The pre-processed signal, which is noisefree, is shown in Figure 3.

Signal decomposition
Signal decomposition aims to obtain essential information from the observed signal so that the characteristics of the ECG signal from each person can be calculated in more detail. (1) (2) (3) . In this study, two signal decomposition methods, namely EEMD and VMD, have been simulated. The signal decomposition analyzes IMF-1 to IMF-5. The resulting decomposition can be up to IMF-7 for each observed ECG waveform, but at that level, the decomposition results tend to be monotonous. Following is a description of the EEMD and VMD methods.

Ensemble Empirical Mode Decomposition (EEMD)
EEMD is the development of empirical mode decomposition (EMD), which was fi rst introduced by Hilbert Huang in 1998 [25]. EEMD is claimed to be able to overcome the weaknesses of EMD [26]. The weakness of EMD is that oscillations with the same time scale are stored in different decomposition levels or vice versa [27]. This condition is ineffective, and the intrinsic information signals become more diffi cult to determine. The report then becomes our reason for optimizing our previous study [20], which used the EMD method. EEMD has the ability to scale better. This has been proven through trials with the addition of white noise to the signal [28]. Each decomposition produced by the EEMD does not show a relationship between one another. EEMD is able to compensate for noise better because one of the parameters calculated is noise amplitude (A). EEMD can be applied according to the following algorithm: 1. Add white noise to the original signal according to the following equation.
2. Signal decomposition of Y n (t) into several IMF with the residue.
3. Repeating the fi rst and second steps using different white noise added to the original signal. 4. Calculate the average of severalensembles (N) using the following equation.

Variational Mode Decomposition (VMD)
Variational mode decomposition (VMD) is a decomposition method of non-stationary signals. It decomposes the signal into several components that are entirely intrinsic and non-recursive [29]. VMD is developed to overcome EMD defi ciencies, including backward error correction, noise sensitivity, and the selection of predefi ned fi lter bank boundaries. The VMD method adaptively calculates related bands and overcomes the presence of noise so that it can decompose the input signal more effi ciently. The VMD method has become a popular tool, one of which is in biomedical signal processing. In a simple defi nition, VMD decomposes the original signal into a band-limited ensemble mode [30]. To assess the bandwidth of one-dimensional signal s with a mode u k can be done by following these steps: • Compute the analytic signal associating with u k to obtain a unilateral frequency spectrum by using Hilbert transform, • The frequency spectrum of the mode is shifted to baseband. This process is done by mixing the exponential to the respective estimated center frequency, • Calculate the bandwidth using H1 Gaussian smoothness of the demodulated signal. The constrained variational problem of VMD is defi ned as [29]: M−1 is the total amount of IMF decomposed from, is the m th level/mode and r M (n) is the remainder obtained in the n th experiment.
Here, ƒ is the input signal, u k is the subcomponent's discrete numbers or the modes which, during the reproducing the input, have specifi c sparsity properties. The sub-signals are compact around the center pulsation and said as a band-limited intrinsic mode function (BLIMF). The expected number of BLIMF is defi ned as K while the center frequencies of BLIMF and short-hand notations are expressed with u k and ω k . The results of decomposition both from EEMD and VMD are then analyzed statically as a signal feature set. The statistical parameters are mean, variance, skewness, kurtosis, and entropy. The defi nition of these fi ve parameters is explained in the previous study [20].

Support Vector Machine (SVM)
Support Vector Machine (SVM) is a discriminative classifi er developed by Cortes and Vapnik in 1995 [31]. Basically, SVM is used for solving a two-classes classifi cation task [32]. The main concept of SVM is by mapping the input vector into a very high dimension feature space. The next process is by creating an imaginary plane which called as hyperplane. This hyperplane then used to separate the data into two groups. SVM is an effective tool for classifi cation problems and pattern recognition. It is very applicable in many fi elds, including image processing [33], [34], bio-signal analysis such as electrocardiogram [35], [36] and electroencephalogram [37][38][39]. Two types of SVM are used in this study.

Linear Support Vector Machine (SVM)
To optimize the distance between the nearest data and the hyperplane, an optimization algorithm is needed. For a group of data X such as expressed in Equation 5.
Here, the value of b i is -1 or 1. The number of training data is denoted with n. Therefore, for every , the hyperplane can be expressed in Equation 6.
Here, the vector to the hyperplane is denoted as . If the data can be separated linearly, the hyperplane is expressed in Equation   On the other hand, if the data cannot be separated linearly, then the calculation of the hyperplane is needed to be adjusted by using Equation 9.
Here, the T is the trade-off parameter between the error of the training set and the separation of the classes. The ε is the set of slack variables. The main purpose is to fi nd the minimum distance between two hyperplanes by minimizing the . 2. Nonlinear Support Vector Machine (SVM) Another approach for the SVM classifi er is by using a kernel trick to extend to a non-linear surface, such as proposed by Boser et al. [40]. The original space data can be shifted to the higher dimensional space by using non-linear functions. They are polynomial and Gaussian function (radial basis function). This study used the polynomial functions, which consist of the quadratic and cubic functions. The function is expressed in Equation 10.
Here, for quadratic function the d=2, and for cubic function, the d=3. Figure 4 and Figure 5 shows an example of the decomposition results (IMF-1 to IMF-5) in two different subjects. Visually there is a difference in each of the decomposition signals between the two subjects, which are observed. The EEMD and VMD results show that there are differences in signal form between the two subjects. Statistical features are then calculated in all decomposition signals producing 25 feature vectors. Figure 6 shows the average value of each statistical parameter of each subject. Signifi cance differences test in each feature for each IMF was also observed using one-way ANOVA. Differences between individual ECGs are considered to have statistical signifi cance if the p-value <0.05. The results of the signifi cance tests are presented in Table 1. Each IMF has features with signifi cant differences, especially in IMF-1, IMF-2, and IMF-4, signifi cant differences in almost all features.In IMF-5, signifi cant differences are only found in one feature. This can occur because a high-level IMF will generate relatively monotonous signals [41]. Since not all features have signifi cant differences, in the validation process, all features are used as predictors, with expectations it can produce high accuracy.  The average value of each feature shown in Figure 6 represents that the subject's ECG signals have different characteristics from each other. As the results of the signifi cance tests, which are shown in Table 1, visually in Figure 6 shows that parameters such as mean, variance, skewness, and entropy have adjacent values. This might be considered in the feature selection strategy in order to get effi ciency in the use of the number of features.       The next stage is the validation of the proposed method. 10-fold cross-validation and SVM are used to validate the performance of the proposed method. Linear, quadratic, and cubic SVM are simulated to observe which kernels are capable of producing the highest accuracy and also generalize the performance of the proposed method. In this study, feature selection is not performed so that all attributes are used as predictors. The results of each simulation are shown in Table 2. Table 2 shows the highest accuracy was 98.2%, both of EEMD and VMD. The highest accuracy is achieved by linear and quadratic SVM. In this case, the linear and quadratic kernels are able to produce higher accuracy than cubic kernels, about 0.9% to 1.8%. Refers to the feature distribution pattern, which is shown in Figure 7, linear or quadratic separation lines are the best boundary lines and also consider the effi ciency of calculations compared to cubic kernel. Table 3 confi rms the results of the validation test in the confusion matrix format. It shows that the proposed methods are competitive due to the accuracy achieved and also give the same misidentifi cation in subject-1 and subject-3. This study outperforms some previous studies [18][19][20]. A summary of each study is shown in Table 4. We specifi cally highlight the study [20], in which the method of signal decomposition and statistical feature extraction was also applied. We can conclude that EEMD and VMD have better performance compared to empirical mode decomposition as used in studies [20]. Theoretically, EEMD and VMD are able to decompose  [42], that in certain cases, the VMD method produces better performance than EMD.

CONCLUSIONS
A personal identifi cation using the new biometric approach based on single-lead ECG signals is proposed. This study is an improvement from previous research in order to get higher accuracy. In addition, this study used a larger population. A total of 110 ECG signals are taken from 11 participants were simulated in this study. The multilevel signal decomposition method, combined with the statistical calculation, is used for feature extraction.
In the feature extraction stage, Ensemble Empirical Mode Decomposition (EEMD) and Variational Mode Decomposition (VMD) decompose the signal into fi ve levels. Mean, variance, skewness, kurtosis, and entropy are calculated at each level of decomposition signal, which then becomes a set of features. Validation of proposed methods is done by 10-fold cross-validation and SVM. Linear, quadratic, and cubic kernels were tested to fi nd out the best performance. This research shows that the performance of EEMD and VMD is equally good, where both methods reach 98.2% accuracy. The results of this study generated an increase in accuracy of 4.4% compared to previous studies, which resulted in an accuracy of 93.8% in the same dataset. The increased accuracy also proves the hypothesis that ECG signals have unique characteristics among persons. In addition, the simulation provides evidence that EEMD and VMD have better performance than EMD. The same oscillation signal is not generated in different decomposition levels. It is very important to avoid bias in the feature extraction stage so that it impacts the detection accuracy. However, the biometric system that proposed in this study still has drawback including simulations carried out in small populations, offl ine simulation and have not tested the true acceptance rate and false acceptance rate. Thus, it remains a challenge for this new biometric approach can be applied in a real-world implementation.

Data availability
The data that support of this research are available upon request to corresponding author. Principal contact to su-gondo@telkomuniversity.ac.id.

Confl icts of interest
We declare that we have no confl icts of interest in the authorship or publication of this contribution.