Analysis of Amino Acid Mixtures by Voltammetric Electronic Tongues and Artificial Neural Networks

A new voltammetric electronic tongue with graphite-epoxy composite electrodes modified with metal-oxide nanoparticles is presented for the quantification of tryptophan, tyrosine and cysteine aminoacid mixtures. The signals were obtained by cyclic voltammetry, and data was processed using two different chemometrical techniques, artificial neural networks and partial least squares regression, for comparison purposes. Before performing artificial neural networks data was compressed by fast Fourier transform or discrete wavelet transform. The best results were attained using artificial neural networks with previous fast Fourier transform compression of the data with a normalized root-mean-square error of 0.032 (n=15) for the external test subset. The present method shows results comparable to other similar approaches, but with a much easier sampling process for the training set and new electrode modifiers to form the voltammetric sensors.


Introduction
The analysis of amino acids (AAs) is usually performed using techniques such as gas [1,2] and liquid [3,4] chromatography or capillary electrophoresis [5][6][7].Though, the great demand of AA determinations in many important fields, such as clinical analysis or food and pharmaceutical industry [8,9], leads to the need of simpler and faster procedures, such as sensor-based methods.These electroanalytical approaches have been already exposed as good methods for the analysis of specific AAs thanks to the electroactivity presented by aromatic and thiol groups [8].
Materials traditionally used in the construction of sensors for this kind of applications are carbon, platinum and gold.But these materials are usually modified in order to increase the sensitivity and introduce new features due to electrocatalitic activity, which leads to an increase in the range of detectable AAs [10].
The use of metal oxide-modified sensors (MOS) has increased during the last years due to its great potential in the analysis of many electroactive substances [11,12] or even the detection of the presence of some bacteria in complex systems [13].Moreover, the recent availability of these materials in the form of nanopowder has facilitated enormously the possibility of using them as electrode modifiers.Potential of using MOS materials has widened its scope even more by the use of semiconducting MOS arrays as gas sensors forming electronic noses (ENs).ENs have gained recognition in many fields like food, aroma or medical diagnosis [14].The large amount of data provided by ENs increases the sensitivity and selectivity of the method just by multivariate data processing, which avoids tedious derivatization or optimisation of the sensors for each analyte [15,16].However, the use of MOS is only a possible option to develop electronic tongues (ETs) [17].
ETs have been used for more than a decade, and its use has grown considerably as they present a large range of applications, including environmental analysis, food industry and agriculture [18][19][20].Due to the good performance of voltammetric techniques to analyze certain AAs, voltammetric ETs have already been presented as a good approach for their analysis, in this case for the ones with oxidizable properties, such as cysteine (CYS), tyrosine (TYR) and tryptophan (TRP) [21][22][23].
An important weakness of the voltammetric ETs resides in the complex response obtained, as there is a need to process each voltammogram from each sensor.However, huge amounts of data derived from voltammetric analysis have usually been modelled successfully by chemometric techniques which interpret and extract meaningful data, offset any matrix effect and may allow the resolution of interfering agents and drifts.The most widely used multivariate data analysis techniques for voltammetric ETs are principal component analysis (PCA), partial least squares regression (PLS) and artificial neural networks (ANNs) [24,25].
PCA is commonly used for the revisualization and compression of the data by its transformation into new orthogonal coordinate axes (called principal components), defined in a way to describe almost all the variance contained in the data.The orthogonality presented by the new directions facilitates the use of linear regression models, such as principal components regression [24][25][26][27].
PLS, like PCA, represents the data in new coordinate axes, but in this case it maximises the covariance instead of the variance, and takes into account not just the input data matrix, but also the output matrix [25,28].
Both PCA and PLS are especially suited for linear systems, and although some modifications can be introduced in order to work with non-linear systems, they still do not present as good results as other chemometric techniques, such as ANNs, which excel in the modelling of non-linear systems.
ANNs [21,25,29] present an iterative modelling scheme which imitates the learning neuronal system of the brain, and permit modelling both linear and nonlinear data.As in the brain, ANNs comprise neurons, units arranged in layers.The input data is sent to each neuron, where it is multiplied by a weight plus a bias value and sent through a transfer function.The error between the output and target output values is reduced via iteration by changing the weight values, until the desired error is reached.The ANNs may be optimised by changing its specific configuration (number of neurons, number of layers, transfer functions, error…), which leads to a great amount of possible models for a single study case.Fortunately, there are many algorithms and optimisation methods described in the literature in order to simplify the process [24].
In this work we present a novel voltammetric ET consisting in graphite-epoxy composite electrodes modified with metal-oxide nanoparticles, aimed to the simultaneous determination of CYS, TYR and TRP in mixtures.Overlapping voltammetric responses have been modelled by both ANNs (which have been demonstrated to be one of the most powerful data processing strategies) and PLS, for comparing purposes.Given the multidimensionality and complexity of the generated data (3D data matrix current x sensors x potential), before ANN modelling, the data has been compressed by either discrete Wavelet transform (DWT) or fast Fourier transform (FFT).
Cyclic voltammograms were obtained with a PGSTAT 30 Autolab potentiostat (EcoChemie, The Netherlands), working in multichannel configuration configuration using the GPES Multichannel 4.7 software package (EcoChemie)..A combination electrode formed by metallic Pt plus a Ag/AgCl reference electrode (Crison 5261, Barcelona, Spain) was used in the voltammetric measurements.A Crison micro pH 2002 pH-meter (Crison instruments, Barcelona, Spain) was used for the pH adjustments.
The train set consisted of 27 synthetic samples defined as a full factorial design for the three AAs and three different concentration levels (0 M, 5•10 -5 M and 3•10 -4 M).To estimate performance of the developed model, a test set was used, formed by 15 synthetic samples with the three AAs of interest in random concentrations (with the values rounded in order to get easy attainable micropipette addition volumes) included in the train range (from 0 M to 3•10 -4 M).All the samples used for the model building and validation are summarised in Table 1.

Electronic tongue
Measurements with the voltammetric ET were taken using a multichannel voltammetric cell formed by the reference electrode, auxiliary electrode and five working electrodes, in this case, modified epoxy-graphite composite sensors.One of these was a bare epoxygraphite sensor constructed following the conventional methodology established in our research group [30].The other four were bulk-modified epoxy-graphite sensors with a 4% (w/w) of metal oxide nanopowder added in the mixture before curing (metal oxides described in Section 2.1.).

Voltammetric procedure
The cyclic voltammetries were carried out at room temperature and the measurement cell was composed by the voltammetric ET described at 2.3.The measurements were performed in a range of potential from -1 V to +1.2 V at a scan rate of 0.1 V•s -1 .Before the whole experiment, sensors were properly regenerated by polishing and cleaning, and 3-5 stabilisation cycles in buffer solution (electrolyte described at 2.2.) were done in order to stabilise the voltammetric response.Between measures, a cleaning stage consisting in three full cycles with Millipore water were performed.Samples were measured randomly and only the last scan of each sample was used.

Data processing details
The voltammetries from the train set were used in order to build the ANN model.Several parameters defining the ANN configuration were taken into account (Figure 1), in a way that a number of 5760 different models were built and compared.
Given the high dimensionality conditions, data was first compressed by using DWT or FFT transforms, and subsequently unfolded in a single vector to form the input to the ANN model.The training algorithm (Bayesian regularization) and the DWT (Daubechies) and FFT parameters (decomposition levels and number of coefficients) are stated on Figure 1; these were chosen based on previous experience with similar conditions [31].For the ANN training, convergence errors were chosen at least 5 orders of magnitude lower than the higher concentration of AA tested, and the maximum of training epochs was taken as 200.
For comparison purposes, also a PLS2 model was evaluated.Unlikely other linear chemometric techniques, such as PLS or PCR, PLS2 permits the modelling of several variables together [32].Thus, the linear model obtained by PLS2 would equate the multiple output ANN model.The optimisation of the PLS2 model consisted in choosing the number of latent variables based on a compromise between this number and the percentage of predictive significance, in order to avoid overfitting.
To select the best model (for both ANN and PLS2), the predicted concentrations for the three AAs were compared with the expected ones for both, the train and the test set.The best regression parameters (slope (m) and correlation value (R) close to 1 and intercept (b) close to 0) and the lowest root-mean-square error (RMSE) and normalized root-mean-square error (NRMSE) were considered as indicators of the optimum.
You may get deeper into the chemometric routines of our research group by checking some of our related works [21,25,30,33,34].

Voltammetric responses
When we construct an ET it is very important to use an array of sensors with cross-response, namely poorly selective sensors with partial specificity to the different target compounds in a solution (just as happens in the taste buds of some animals).The low selectivity and specificity are easily solved by the posterior chemometric treatment of the data [35,36].
As it can be seen in figure 2A, the signals offered by the sensors for a specific AA present different shapes.Hence, there is no redundancy between the different information generated by the sensors.
Moreover, as shown in figure 2B, every sensor provides a slightly different signal for each AA.Thus, this suggests the sensor array chosen presents the desired cross-response for the desired ET application.
The higher signals provided by the Bi MOS can be related to the capacity of Bi in forming complexes with some AAs [37,38].Therefore, a proper established cleaning step between measures results mandatory.

ANN
The best ANN model was designated following the criteria described in section 2.5., and was achieved without any autoscaling of the data, by FFT compression and using 160 input neurons, 1 single hidden layer formed by 7 neurons, the purelin hidden layer transfer function and satlins output transfer function.
Figure 3 shows the comparison of the obtained values vs. expected results (the known concentrations) with both the training model (blue dashed line and squares) and the external test subset (red dotted line and circles) for the three AAs.Also the ideality (r=1, slope=1, interception=0) is shown (black solid line).A clear evidence from the comparison figures is that the PLS variant displays a higher dispersion of the data than the ANN case.

PLS
The optimisation of the PLS model was performed following the protocol described in section 2.5., and the best performance was achieved with 10 latent variables.In this case, no compression was performed at all.On figure 4 are shown the plots which compare the expected results vs. the obtained concentration values with both the training model (blue dashed line and squares) and the test subset (red dotted line and circles) for the three AAs.Also the ideality (r=1, slope=1, intercept=0) is shown (black solid line).In table 3 the correlation, slope and interception for every set and AA (confidence level 95%), and RMSE and NRMSE values for the training model and the test subset are presented.
If we compare the values from table 2 and 3, we can notice that the slope and interception values for the train set from the PLS model are better (closer to one and zero respectively) than the ones got with the ANN model, but with the test set occurs the opposite.We also can appreciate that the correlation values got with the ANN model are closer to one than the ones achieved with the PLS model, as well as lower RMSE and NRMSE values.Taking into account the obtained results we could affirm that the ANN model presents a better prediction of the concentration of TRP, TYR and CYS in AA mixtures than the PLS model.

Conclusions
In this work we have presented a novel voltammetric ET consisting in a simple five metal oxide sensor nanopowder modified array capable of predicting the concentration of TRP, TYR and CYS mixtures in solution.The complex voltammetric data has been successfully processed by the coupling of FFT and ANN.
Thanks to the different redox catalytic properties of the modifiers, slightly different oxidation and reduction peaks were obtained for each sensor and AA (cross response), making possible the ET application The processing of the raw data by FFT and ANN allowed the removal of noise, matrix effect and redundant information.Comparison with PLS2 model showed that the ANN model presents better results with the samples studied in this work.Although the involved computing times are extended, the protocols for the optimisation of ANN models become increasingly efficient daily, and they are needed only during the optimisation and calibration step.Obtained results, although perhaps not matching perfectly the expected ones are counterbalanced by the rather simple procedure used for training.With fixed factorial experimental designs, richness or variability is precluded, and with its limited extension only a quick screening methodology is finally derived.It is evident that simply by enlarging the training data set, or by using some other experimental design with higher variability obtained results would surely be improved.
Future works will try to quantify the three oxidizable AAs in real samples using the methodology described.The method presented would be especially useful in the analysis of samples in which the speed in obtaining the results is more critical than the accuracy of the analysis, avoiding this way the use of more expensive, tedious and time consuming techniques, such as chromatography.

Figure 3
Figure 3 Comparison of the expected vs. the predicted concentrations for the optimized FFT-ANN model.

Figure 4
Figure 4 Comparison of the expected vs. the predicted concentrations for the optimized PLS2 model.

Table 1
Composition of the amino acid mixtures standards used to train and evaluate performance the ANN model

Table 2
Linear regression parameters obtained comparing the expected and predicted concentration values by the ANN model.

Table 3
Linear regression parameters obtained comparing the expected and predicted concentration values obtained with use of the PLS2 model.