Improved Statistically Based Retrievals via Spatial-Spectral Data Compression for IASI Data

In this paper, we analyze the effect of spatial and spectral compression on the performance of statistically based retrieval. Although the quality of the information is not completely preserved during the coding process, experiments reveal that a certain amount of compression may yield a positive impact on the accuracy of retrievals. We unveil two strategies, both with interesting benefits: either to apply a very high compression, which still maintains the same retrieval performance as that obtained for uncompressed data; or to apply a moderate to high compression, which improves the performance. As a second contribution of this paper, we focus on the origins of these benefits. On the one hand, we show that a certain amount of noise is removed during the compression stage, which benefits the retrievals performance. On the other hand, we analyze the effect of compression on spectral/spatial regularization (smoothing). We quantify the amount of information shared among the spatial neighbors for the different methods and compression ratios. We also propose a simple strategy to specifically exploit spectral and spatial relations and find that, when these relations are taken into account beforehand, the benefits of compression are reduced. These experiments suggest that compression can be understood as an indirect way to regularize the data and exploit spatial neighbors information, which improves the performance of pixelwise statistics-based retrieval algorithms.


I. INTRODUCTION
In recent decades, advances in remote sensing technology have made it possible to collect information from the electromagnetic spectrum with unprecedented accuracy and resolution [1].Infrared sounder instruments usually produce large volumes of data, which is costly to manage in an operational context, i.e., for transmission, processing, and storage.An effective strategy to alleviate the problems derived from the data size is to compress the data according to the specifics needs of the final users.However, in order to achieve high compression ratios, it is typically needed to go through a lossy compression stage.Lossy compression entails to lose some information in the reconstructed data, which may compromise the quality of the products in later processing stages.Therefore, a careful evaluation of the impact of the compression process applied to the data is needed to determine the quality of the information and of the derived products.
Although lossy compression implies going through a distortion process, the quality of the recovered data can be still adequate for the intended specific use.If the amount of the inherent data acquisition noise is large, the signal removed during the coding process is mostly noise for certain compression ratios [2], [3].Therefore, the amount of useful signal lost is usually low and the quality of the recovered products may be yet appropriate, or even better suited, to feed a subsequent information extraction stage.Of course, if the compression ratio is very high, a lot of useful information is removed and the quality of the reconstructed data is seriously compromised.
The impact of lossy compression on information extraction systems has been investigated in several application areas.Ryan and Arnold discussed the lossy compression of remotely sensed images using vector quantization in the context of image classification in [4].The impact of compression was analyzed on the maximum likelihood classification of the recovered data concluding that the loss in the classification accuracy was not significant (less than 8%).Sánchez and Perronnin evaluated two lossy compression techniques, one based on product quantizers and another based on dimensionality reduction, to assess large-scale image classification in [5].Results reported that compression ratios between 64:1 and 128:1 produced little loss in the classification performance.Similar results were obtained in [6], [7], [8], [9], [10], [11], [12], where lossy compression did not markedly reduce classification accuracy.
Besides, several studies have analyzed the effect of nearlossless and lossy compression techniques based on JPEG 2000 standard within the framework of feature extraction, classification, and anomaly detection tasks.Pal et al. analyzed the performance of supervised, unsupervised, and hybrid classification processes in [13].Results reflected that classification accuracy was still reliable even at low bit-rates (high compression ratio).A compression scheme based on Principal Component Analysis (PCA) was proposed in [14] producing competitive results in terms of information preservation in anomaly detection tasks.Supervised and unsupervised classification of reconstructed data was evaluated in [15].The experimental results showed competitive classification performance after the compression stage.Analogous results were reported in [16], where the proposed compression algorithm for hyperspectral images did not significantly reduce the performance of hard classification, linear spectral unmixing, and anomaly detection.
An interesting observation was reported in [17], where the impact of lossy compression of hyperspectral data was analyzed on multiclass classification, classification of mixed pixels via spectral unmixing, binary hard classification, and anomaly detection.Experimental results revealed that for multiclass classification and spectral unmixing, the Signal-to-Noise ratio (SNR) was a reasonable indicator of classification performance; for binary hard classification, the performance was little dependent on the SNR; and for anomaly detection, the compression algorithms that produced the best ratedistortion performance were not the best choice.Actually, the effect of lossy compression on supervised classification and spectral unmixing using support vector machines was quantified in [2].The assessment reported that lossy compression can produce accurate results even at high compression ratios (above 16:1).
The estimation of atmospheric parameters from remote sensing data is an inverse problem where we retrieve the physical parameters given a set of observations.Inverting the physical (radiative transfer) model through look-up-tables or optimal estimation models (OEM) [18] are standard approaches, but they lead to much higher computational costs than those required for statistical approaches [19].In the last decade, statistical model inversion based on machine learning has provided excellent performance in accuracy and efficiency terms.Statistically based atmospheric parameter retrieval from IASI and AIRS data was first conducted using vanilla neural networks trained with backpropagation [20], [21], [22].In a set of previous comparisons [19], [23], [24], [25], [26] we used other forms of nonlinear and nonparametric machine learning regression like more advanced neural nets, kernel ridge regression and Gaussian processes.Other strategies to plain regression consider introducing a dimensionality reduction before the regression.Smart approaches consider spatially and noise-aware transforms like in [25], [27], or more advanced nonlinear dimensionality reduction approaches based on kernel machines [28].
The functional sliced inverse regression (FSIR) in [29] is a dimensionality reduction method that generalizes PCA by designing a spectral information driven by the parameters to be retrieved.FSIR is actually related to linear discriminant analysis (LDA) and partial least squares (PLS) approaches, being a kind of supervised dimensionality reduction approach, and provides noise-aware feature components (hence being related to minimum noise fraction -MNF-methods).Nevertheless, the standard PCA approach is still the most widely adopted and studied for dimensionality reduction in atmospheric parameter retrieval.In [30] the impact of spectral compression on statistically based retrievals was analyzed, investigating the use of Principal Components Analysis (PCA) based methods for compressing high-resolution infrared measurements before performing linear regression.It was shown that a compression ratio of 15:1 can be performed with low degradation of the temperature and water vapour retrievals.In [31], [32] we observed that data that had gone through a nearlossless and lossy compression process enabled achieving, for certain compression ratios, improved statistically based retrieval performance compared to the retrieval on the original products.It was suggested that the spectral/spatial regularization and the noise filtering produced by the compression stage benefit the performance of the statistical retrieval algorithms.However, these counterintuitive observations were scarcely studied.In this paper, we provide appropriate responses to the observations reported in [31], [32], investigate the actual impact of compression in the retrievals, and supply results and discussion to understand why lossy compression may give rise to improved retrievals.This led us to lay out the main contributions of this paper: 1) While experiments in [31], [32] were carried out in an ideal scenario, where samples to define the training, validation and testing subsets were acquired under the same conditions, here, we assume a more realistic scenario, where samples used to define the training, the validation and the testing subsets come all from disjoint orbits.Therefore, conclusions are not biased by an eventual risk of model overfitting.2) Three different multi-component transforms have been employed in the compression process, all bringing forth improved performance.
3) The causes of the benefits on the retrieval performance due to a compression stage are thoroughly investigated and discussed, and reasons of the statistically based retrieval improvement are provided.
In this paper, a wide spread lossy compression technique and two different statistical retrieval algorithms are evaluated on Infrared Atmospheric Sounding Interferometer (IASI) L1C data [33].For lossy compression, JPEG 2000 standard [34] is paired along with three different spectral transforms: the Discrete Wavelet Transform (DWT) [35], the Pairwise Orthogonal Transform (POT) [36]), and the Multilevel Clustering Karhunen-Loève Transform (Multilevel Clustering KLT) [37].For atmospheric variables prediction, a linear and a nonlinear statistically based retrieval algorithm, i.e., the Linear Regression (LR) and the Kernel Ridge Regression (KRR) [38], respectively, are employed in the retrieval of physical information (temperature and dew point temperature profiles) from the reconstructed data.
To investigate the origin of the improved retrieval performance produced by lossy compression, two approaches are followed.On the one hand, we analyze the noise level remaining in the recovered spectra.On the other hand, we compare the performance of the retrieval algorithms when a compressed (and reconstructed) data is employed, with a simple method which specifically takes into account information from the spatial and spectral neighbours.Experiments reveal that the improvement in retrieval accuracy is motivated by: 1) lossy compression performing some noise filtering, which typically improves regression and function approximation results; and 2) compression being an indirect way to exploit spatial feature relations, which, generally, helps in pixel-wise retrieval algorithms.
The remainder of the paper is organized as follows.Section II introduces the proposed sequential approach detailing the compression scheme and the statistically based retrieval methods used in the experiments.Section III reports the data collection used in the experiments.Section IV reports and analyzes the experimental results.Section V provides an extensive discussion.Finally, Section VI draws some conclusions.

II. METHODOLOGY
This section describes the methods and techniques used in the experiments.For the compression stage, the JPEG 2000 standard is paired along with three spectral transforms to exploit the high spectral redundancy present in IASI L1C data.For atmospheric prediction, two different statistically based retrieval algorithms, which have provided competitive performance in atmospheric parameter retrieval [39], are employed: a standard least squares LR and a kernel ridge regression (KRR) method.
Figure 1 illustrates the proposed sequential scheme.In a first stage, the data is lossy compressed.At the receiver side, data is decompressed to produce the reconstructed data.Finally, the statistical parameter retrieval is carried out on the reconstructed data.

A. Data Compression
Compression of remote sensing products is an efficient strategy to reduce the large size of the collected data.In recent years, various compression strategies have been adopted in several standards and have been implemented both on board the satellites and on-the-ground processing stations [40], [41], [42], [43], [44].
Three compression approaches can be considered, namely, lossless, lossy, and near-lossless.Lossless compression is the mandatory approach when the original signal must be fully preserved, however, the achievable compression ratios are rather low (below 5:1).Lossy compression can be a desirable approach in scenarios where high compression ratios are required and losing some information in the signal might be admissible.Near-lossless compression is also a lossy compression approach, where some information is removed from the original data, but the compression ratios achieved are closer to lossless compression than to typical lossy compression.Near-lossless compression is an adequate approach when higher compression ratios compared to lossless compression are demanded and, at the same time, a specific fidelity criterion (usually the peak absolute error) must be preserved in the recovered data.
Here we will focus on lossy compression because it is a widely accepted strategy when large volumes of remote sensing data need be compressed, as witnessed in the todayin-use pipeline for IASI data [45], [46] (recall that we will use IASI data for the experiments).
1) Lossy Compression: Lossy compression approach allows to achieve high compression ratios at the expense of recovering a reconstructed data not identical to the original ones.It is expected that the reconstructed data preserve enough information to be used for the intended specific purpose, while alleviating problems derived from the transmission, handling, and storage of large volumes of information.
The international standard JPEG 2000 [34] is used to carry out the lossy compression stage.JPEG 2000 was published by the Joint Photographic Experts Group (JPEG) in year 2000 as the successor of classical JPEG coding standard, and provides a wide diversity of features and functionalities in a single compressed code-stream: among others, lossless and lossy compression, progressive lossy-to-lossless coding, robustness to the presence of errors, region-of-interest coding, and progressive transmission.JPEG 2000 is employed in a wide range of applications, e.g., remote sensing, medical imagery, mobile applications, digital library, digital photography, etc.
To achieve competitive compression performance in data formed by thousands of spectral components, like IASI L1C products, it is of paramount importance to exploit the redundancy present in the spectral dimension.Three spectral transforms are paired along with JPEG 2000: the Discrete Wavelet Transform (DWT), the Pairwise Orthogonal Transform (POT), and the Multilevel Clustering Karhunen-Loève Transform (Multilevel Clustering KLT).
DWT decomposes the processed signal into different subbands, which allows to decorrelate the data to be encoded.When the transform is applied, the signal is decomposed into two set of coefficients: the low and high frequency subbands.The low frequency subbands (L) have a coarse frequency resolution, while the high frequency subbands (H) represent fine details of the data.The high frequency subbands contain coefficients that can be discarded or quantized to achieve efficient lossy compression.Often, competitive data quality can be achieved from a small amount of transform coefficients, which makes DWT a suitable transform for lossy compression [47].DWT can be applied on the transform coefficients (low frequency subbands) in successive levels producing further decorrelation, although the optimal number of transform levels shall depend on the particularities of the data.DWT can be used too to transform two dimensional signals by applying a one-dimensional transform in the vertical direction and a one-dimensional transform in the horizontal direction, which usually improves the coding results.
Both POT and Multilevel Clustering KLT are affordable approximations of the computationally demanding Karhunen-Loève Transform (KLT) [48], which is optimal decorrelating Gaussian sources.A critical consideration when applying a spectral transform on data volumes with large spectral dimension is the computational complexity [49].Some spectral trans-forms like the KLT may prove unusable for large dimensions such as in IASI data.POT and Multilevel Clustering KLT rely on the implementation of a divide-and-conquer strategy to the KLT.While a classic KLT decorrelates all components with each other regardless of how much energy they share, a divide-and-conquer strategy implements a mechanism to decorrelate several spectral components with high shared energy and ignore the other components.The resulting transform is the composition of smaller KLT transforms, where each composition is processed as a classic KLT.If a multilevel mode is used, most of the data energy flows across the composition of transforms up to the last level, because most of the signal energy is grouped in the first few decorrelated components, which are further decorrelated in upper levels.POT uses a two-component KLT transform for every pair of consecutive components and works in a multilevel mode.In the case of Multilevel Clustering KLT, each composition of KLT can be formed by a number of spectral components between two and all the spectral components.Figure 2 illustrates the structure of a classical KLT, of a POT and of a Multilevel Clustering KLT.

B. Statistically Based Parameter Retrieval
In the last decades, statistically based retrieval has proved to be very useful to solve inverse problems using remote sensing data.Different algorithms have been considered, but mainly restricted to neural networks [20], [21], [22], [25] and kernel machines and Gaussian processes [19], [23], [24], [26].In particular, kernel methods are well suited due to its ability to deal with high dimensional data [50].Kernel methods generalize linear algorithms while still relying on linear algebra.This is why we focus on two approaches for (first guess) parameter estimation: least squares (regularized) linear regression and its generalization using kernels.We will use classical linear regression (LR) and kernel methods in this work to implement the step of retrieving physical parameters from the data.We will use the Kernel Ridge Regression (KRR) algorithm [38], [51], which has shown very good performance in prediction when used on IASI L1 data on different problems [39], [52], [53].KRR was the method employed in [31], [32] to analyze the effect of compression on physical parameters retrieval at different compression ratios.The KRR has the advantage of generalizing least squares linear regression to the nonlinear regression case.
1) Statistically Based Retrieval Methods: Here we define the notation and review the formulation of KRR from the LR solution.We will denote x i ∈ R Dx as the inputs and y i ∈ R Dy as the desired outputs.In matrix notation, we will denote the training input samples as

and the test desired outputs
Using the matrix notation, the application of the linear model is: where we have discarded the bias term and W L ∈ R Dx×Dy are the regression weights.We fit the weights using classical least squares solution, which depends on the inversion of (X X).In order to ensure inversion we use the classical Tikhonov regularized solution: Kernel methods are based on defining a mapping function φ(x) for the input samples to a Hilbert space, H, of very high (possibly infinite) dimensionality D H . KRR can be defined following the same procedure as in the LR case as a linear least squares regression in the Hilbert space.If we map the samples using the mapping function we have Φ = φ(X) ∈ R N ×D H , and Φ * = φ(X * ) ∈ R M ×D H . Therefore the prediction model is given by: As in the LR case, we can apply the Tikhonov regularized solution to find the weights: Note that this problem is not solvable as the inverse runs on matrix ΦΦ , which is of size D H × D H , and Φ is in principle unknown.However, if we use a λ that ensures that the matrix can be inverted, the solution is equivalent to: If we summarize the right part of the equation by α = (ΦΦ + λI) −1 Y, the predictions for the text input samples is: Note that even though the mapping Φ is unknown, one can replace this inner product matrix with a similarity matrix between samples, which is known as the kernel matrix K.In this case we can replace the Gram matrix as K = ΦΦ , and equivalently K * = Φ * Φ .Therefore the predictions can be computed by: In KRR we only need a kernel function, k(x i , x j ), according to the Mercer's theorem [54].Kernel methods literature is full of examples of proper kernel matrices.Here we will use the most standard one, the Gaussian Function (Radial Basis Function, RBF) kernel k(x i ,x j ) = exp(- )), which has only one free parameter, σ.Therefore, in LR one parameter is tuned: the regularization parameter λ.In KRR, two free parameters are tuned: the regularization parameter λ and the kernel parameter σ.
In both cases we used a cross-validation strategy for their optimization using one half of the data for training and the other half for validating.A MATLAB implementation of KRR and other regression algorithms can be found at http://isp.uv.es/soft_regression.html.III.DATA AND EXPERIMENTAL SETTING This section introduces the data collection used in the experiments and the parameter configuration employed in the compression and in the statistically based regression stages.First, a description of the main characteristics of the IASI instrument is given, along with the specific products used in the experiments.

A. Data Collection
IASI is the main payload instrument carried on the MetOp satellite series [55].Data provided by the instrument represent a significant improvement in the retrieval accuracy and vertical resolution of atmospheric parameters (temperature and water vapour concentration mostly) with respect to previous lower spectral resolution instruments, such as HIRS, SEVIRI, etc.This has in turn improved the output from numerical weather prediction models and atmospheric chemistry studies.
The instrument collects data with high spectral, spatial, and temporal resolution producing large volumes of information (about 16 Gigabytes per day generated by each of the IASI-A and IASI-B instruments).IASI covers the spectral range between 645 and 2760 cm −1 yielding 8461 spectral components.Such an amount of information is costly to manage, hence the need to search for efficient strategies to reduce the large size of the data for improved processing, transmission, and storage.
An effective strategy to alleviate the large volume of information produced by the IASI instrument is to compress the data according to the specific requirements of the enduser applications.As outputs of the models, we use the physical variables (temperature and dew point temperature) given by the analyses of the model of the European Centre for Medium-Range Weather Forecasts (ECMWF).The model provides estimations for 137 different pressure levels between [10 −2 • • • 10 3 ] hPa in the atmosphere and spatial resolution of 0.5 degrees.We co-registered the predictions supplied with the scenes acquired by IASI instrument with the analyses of the ECMWF.
The experiments have been conducted in a realistic context, where data from disjoint orbits are used for training and testing.The training set is composed by seven orbits and the atmospheric parameter retrieval test is carried out on six different orbits.All data have been produced by the IASI-B instrument, implemented on the MetOp-B satellite.Table I reports the characteristics of the 13 IASI orbits used in the experiments.In order to isolate the results for cloud free and cloudy areas we use the fractional cloud cover data provided in the L2 level [62].
Usually, the Band 3 of the IASI spectrum and some channels from the Band 1 and from the Band 2 are not used for temperature and dew point temperature retrieval because they are influenced by solar radiation and trace gases such as CO, CH 4 , etc.Following the feature selection in [39] we performed feature selection removing the most noisy bands and keeping 4699.

B. Setting and Parameter Configuration for Lossy Compression
Lossy compression is carried out through JPEG 2000 standard.It is meaningful to consider that, in hyperspectral data with such a large number of spectral components as IASI products, it is of utmost importance to exploit the high spectral redundancy inherent to the data.As mentioned, to achieve improved coding performance, JPEG 2000 is paired along with three spectral transforms: for DWT, 10 levels of 9/7 DWT are applied in the spectral dimension; POT is run using the default parameter settings; and for Multilevel Clustering KLT, 100 clusters are defined in the first level.To account for the spatial redundancy, a 9/7 DWT with five levels is applied in the spatial dimension.Nine target bit-rates are analyzed, distributed from 2 to 0.0025 bit per pixel per component (bpppc), corresponding to compression ratios from 8:1 to 6,400:1.Table II summarizes the main characteristics of the three compression settings studied in the experiments.
In the experiments, Kakadu software [63] has been used for JPEG 2000, Pairwise Orthogonal Transform software [64] has been run for POT, and Spectral Transform software [65] has been employed for Multilevel Clustering KLT.The JPEG 2000 standard provides a multicomponent extension in its Part 2 [66], which has been used to apply the DWT.
It is worth mentioning that, in our lossy compression approach, we apply a decoding stage prior to the statistically based retrieval, so that retrieval works with the reconstructed  data and not in the compressed domain.These reconstructed data have exactly the same spectral and spatial dimensions as the original data, i.e., no spectral or spatial dimensionality reduction occur.It is the quality of the observations that is being modified, not the size of the scenes.

C. Settings of Statistically Based Regression
The retrieval experiments have been carried out in a realistic scenario, i.e., data from disjoint IASI orbits have been used for training and testing.The statistical retrieval algorithms are applied on IASI data hyperpixels at different pressure levels considering all the spectral components at a particular spatial position in the regression.
In the training stage 8,000 samples are randomly selected from each of the training orbits (seven products), which produces a training set of 56,000 samples.To allow a fair comparison, the positions of the samples used for training are kept constant for all the experiments, i.e., all three multicomponent transforms and all compression ratios.Six different orbits are used for testing.The prediction is carried out on the whole orbit.To assess the retrieval performance, the RMSE between the ground-truth values of temperature/dew point temperature and the predictions produced by the regression models is computed in each pressure level.
We trained one model for each coding setting and each compression ratio.Note that the input data to the retrieval process is the reconstructed spectra, i.e., compressed and decompressed data.The models were trained using a simple cross-validation scheme with half of the data used for training and the other half used for validating the parameters.

IV. EXPERIMENTAL RESULTS
This section presents the experimental results.We report results for atmospheric parameter retrieval over the reconstructed spectra and investigate the origin of the improvement in the retrievals produced using the reconstructed spectra.Results are evaluated in terms of noise removal and shared spatio/spectral information to assess the impact of compression on the retrievals.This study is conducted through several experiments: 1) We evaluate the performance of the retrievals produced using reconstructed spectra.Different compression settings and retrieval algorithms are used.The experiment is carried out in a realistic scenario where data used for testing comes from orbits different from those used for training.2) We analyze the amount of noise left in the data after being processed with each compression setting and at different compression ratios.
3) We analyze the spectral and spatial information in the compression settings.First we quantify the amount of information shared among the spatial neighbours.We then propose an scheme which specifically enforces different levels of spatial and spectral relations in a simple way, and compare the performance of the retrieval algorithms when using data processed following this scheme and when using reconstructed data from the compression algorithms.Results are reported for one IASI L1C orbit and one physical variable (dew point temperature) in most of the experiments due to restrictions in page length.Conclusions from the results for the remainder orbits and for temperature are similar.We report all these results in the supplementary material at http://isp.uv.es/spatio_spectral_compression.html.

A. Retrieval Assessment
This section reports the results of estimating atmospheric physical variables through LR and KRR using reconstructed IASI L1C spectra.Results are presented for the orbit IASI_xxx_1C_M01_20131017030856Z_2013-1017045352Z_N_O_20131017035958Z (see table I) and dew point temperature prediction.Results for all the other orbits and variables are very similar and reported as supplementary material.
A first assessment of the reconstructed radiances is reported in the Appendix at http://gici.uab.cat/pub/spatio_spectral_compression/appendix.html, where the normalized radiance residuals statistics and the spectral signature of the reconstructed radiances are analyzed.Next, Figure 3 shows the retrieval performance of dew point temperature prediction for the lossy compression settings proposed.Plots show the average of the RMSE prediction over the different pressure levels between 1100 and 100 hPa.We analyze four different scenarios, i.e., land and cloud free, land and cloudy, ocean and cloud free, and ocean and cloudy conditions.Cloud free conditions is equivalent to cloud fraction equals to 0%.Otherwise, it is considered cloudy conditions.Ocean is equivalent to land fraction equals to 0%.Otherwise, it is considered land.Table III   One can see that while low compression ratios do not reduce the accuracy in the retrievals, the prediction results are improved as the compression ratio increases.Nonetheless, when the compression ratio is very high, the retrieval performance begins to deteriorate because too much distortion is introduced in the reconstructed data.This behaviour occurs for both LR and KRR, although the retrieval improvements are larger for LR (in the sense that higher compression ratios can be applied before the retrieval performance starts to deteriorate).These results are consistent with the observations reported in [32].However, note that here the situation is more realistic than in [32], since we use different orbits for training, validation, and testing.By doing so we ensure that no information about the true values due to the blurring introduced by the compression settings can be used in the predictions.Results are also consistent for the new compression settings evaluated.
An interesting analysis was introduced in [27] about the correlation between the dimensions of the predictions, where an index (i D ) of non-diagonality of the covariance matrix of the predictions for different pressure levels, Σ Ŷ , was proposed: where S ŷ = ||Σ ŷ ||, Σ ŷ is the covariance matrix of the predictions, and B d ŷ is the biggest value in the diagonal of Σ Ŷ (see [27] for details).Figure 4 shows the value of the i D index for the different compression configurations and the different regression algorithms taken into consideration in this paper.We provide the index without the normalization by the number of output dimensions.We can see that given a particular regression method, the differences in the correlation between the outputs for different compression ratios (and also for different compression settings) are negligible for the compression ranges with low prediction errors, i.e., until compression ratios about 300:1.
Several conclusions can be drawn: 1) Statistical atmospheric parameter retrieval benefits from spectral/spatial lossy compression.It is clear from Fig. 3 that using reconstructed data after a compression process allows to achieve improved statistically based retrieval performance.While moderate and medium compression ratios enable to achieve at least the same performance than the original spectra, when the distortion level introduced in the data is high, the accuracy in the retrievals decreases.In all the scenarios analyzed, a compression ratio of, approximately, 100:1 allows to achieve the same or improved retrieval results compared to retrieval results obtained from uncompressed spectra (original data).2) KRR produces more accurate predictions than LR.For all plots, KRR consistently yields better retrieval performance than LR.These results are constant for all the data (original and reconstructed spectra) and scenarios analyzed.The difference is specially significant for cloudy conditions.3) Lossy compression enables more significant retrieval improvements for LR compared to KRR.The retrieval improvement is more evident for LR than for KRR when a compression process has taken place, which suggests that the spectral/spatial regularization produced by the compression stage particularly benefits LR.
Land Compression ratio Fig. 3. Dew point temperature (in kelvin) retrieval performance for different lossy compression settings using LR (solid red lines) and KRR (solid blue In all the plots, the vertical axis represents the averaged RMSE over the different pressure levels between 1100 and 100 hPa and the horizontal axis represents the compression ratio.Ranges are the same in all the plots to ease the comparison.Each row shows the results for a particular compression setting and each column shows the results for a particular scenario.Results using uncompressed spectra (original data) for LR (dashed red lines) and KRR (dashed blue lines) are plotted for comparison purposes. .Index i D for different lossy compression settings using LR (solid red lines) and KRR (solid blue lines).In all the plots, the vertical axis represents the i D index and the horizontal axis represents the compression ratio.Ranges are the same in all the plots to ease the comparison.Results using uncompressed spectra (original data) for LR (dashed red lines) and KRR (dashed blue lines) are plotted for comparison purposes.
4) LR achieves competitive performance when the spectra have been compressed at higher compression ratios compared to KRR.Fig. 3 illustrates that while KRR is able to achieve improved retrieval results when data compressed at a compression ratio of, approximately, 124:1 are used, LR still produces competitive performance when spectra compressed at a compression ratio of, approximately, 203:1 is employed.pression ratios leading at least to the same retrieval performance than the original spectra.If the interest is to achieve very high compression ratios, POT + JPEG 2000 would be the best compression setting.A compression ratio of 421:1 allows to achieve at least the same prediction results compared to the original data for all the scenarios analyzed.Higher compression ratios decrease the retrieval performance because useful information is missing in the recovered spectra.7) The effect of lossy compression is alike for all the compression settings and scenarios analyzed.One can see in Fig. 3 that, depending on the compression setting applied and the scenario analyzed, the results can slightly vary, but the impact of lossy compression on retrieval performance is similar for all cases.8) The effect of the compression on the predictions correlation is negligible for useful compression ratios.If we take in consideration the range of compression ratios for low prediction error (i.e.[0-300], see Fig. 3), the difference in correlation between the predicted outputs when using and when not using compression is very low (Fig. 4).The same effect can be observed if we compare this difference with the difference inferred by using different prediction methods.Figure 5 shows the bias and the RMSE results for the range of pressure levels corresponding to the troposphere (between 1100 and 100 hPa).For each compression setting and each scenario, results for the compression ratio that produces the best prediction performance are reported.
Reconstructed spectra produce better (or at least the same) results compared to the original data for all pressure levels and all compression settings.Generally, improvements are more evident in the mid-low troposphere.As illustrated in Fig. 3, the improved prediction is more significant for LR.Results are usually better over ocean areas than over land, and better on cloud free situations rather than in cloudy scenarios.
The Mean Absolute Error (MAE) between the dew point temperature values provided by the ECMWF analyses (and taken here as the ground truth), and the predicted values produced by LR and KRR when uncompressed and reconstructed spectra are employed in the retrievals is illustrated in Fig. 6.Maps plot the MAE for all the analyzed orbits.Results are reported only for the original data and Multilevel Clustering KLT + JPEG 2000, which is the compression technique that produces the most competitive performance, due to restrictions in page length.As expected, KRR produces smaller errors than LR (more bluish areas), and maps show that the MAE is very similar when the original data and the reconstructed spectra are used.Errors occur in the same geographic areas and the error pattern is similar for both the original and the reconstructed data.

B. Noise Reduction (denoising) through Compression
This section is specifically dedicated to analyze the effect of the compression process on the noise present in the data.Lossy compression is known to be an indirect way of performing signal filtration and denoising.Compression aims for a compact representation of the data by eliminating redundant information, preserving as much as possible the characteristics of the observation needed by the intended data user.
Figure 7 reports an estimate of the noise level present in the reconstructed spectra for the compression settings analyzed.
In the experiments it is assumed that the noise model in the IASI data comes from an additive white Gaussian distribution, and also independent and identically distributed.This is a realistic assumption for an interferometer and is the model adopted in [46], [67], [68], [69].To compute the noise standard deviation, the Anisotropic Nonparametric Image Restoration toolbox [70] was used.This method provides a robust estimate of the standard deviation based on the well-known median of the absolute deviation (MAD) [71].This technique is used along with an orthogonal wavelet transform in such a way that the median estimator is applied on the high frequency subbands (fine details) of the transform domain to reduce the impact of the features present in the signal.In the experiments, a Daubechies wavelet transform was used.Specific and more detailed description of the adopted strategy can be found in [72], [73].
We can see how part of the improvement in the retrievals is due to the filtering of noise performed by the compression process.It is interesting to compare Fig. 7 with Fig. 3.It is clear from Fig. 7 that, as the compression ratio increases, the noise level in the reconstructed data decreases.While most of the noise is removed at moderate compression ratios, the noise level keeps unchanged for extremely high compression ratios because almost all the noise has already been removed from the data.When data is compressed in moderate to high compression ratios (i.e., [10:1-300:1]), most of the content removed by the compression process is noise, keeping the retrieval performance competitive or even getting better results than using uncompressed data.However, if the compression ratio is extremely high (i.e., [300:1-2,000:1]), useful information is also eliminated, degrading the accuracy of the retrievals.We can suppose that noise filtering is one of the reasons why reconstructed compressed spectra from high compression ratios can result in better performance compared to uncompressed data.It is also clear that increasing the compression ratio beyond a certain point has no effect on the elimination of noise and therefore the only content removed from now on is relevant information.Therefore, the retrievals performance decreases beyond this point.Note that the compression settings POT + JPEG 2000 and Multilevel Clustering KLT + JPEG 2000 achieve lower maximum compression ratios compared to DWT + JPEG 2000 because side-information needs to be transmitted in addition to the compressed data.

C. Exploiting Spectral/Spatial Relations through Compression
It is acknowledged that parameter retrieval and model inversion applications largely improve when exploiting spatial information [74], [75], [76].Compression is an indirect way to stress relevant feature relations and to enforce smoothness in the reconstructed data, which can be exploited by pixelwise retrieval methods.
In this section we analyze the impact that the analyzed compression approaches have on sharing spectral and spatial information about the neighboring pixels and the effect it has in the retrieval performance.We first asses how much information is shared between spatial neighbours.Then we propose a simple method which specifically takes into account information from the spatial and spectral neighbours, and we compare these results with the results obtained when we do the retrieval over the recovered coefficients after compression.We also investigate how the benefits of compression (sec.IV-A) are affected when we have already exploited specifically the spatial/spectral information in the data.
1) Assessing the amount of Spectral/Spatial Information shared through Compression: Here we use information theory [77] measures in order to quantify the amount of information that spatial neighbours share when use different compression approaches and different compression ratios.In particular we compute the mutual information between one pixel and its neighbours.The same procedure has been used to measure the amount of information shared between spatial, spectral and orientation neighbours in the wavelet domain [78] and in the normalized domain [79].
In Fig. 8 we show the results when the spatial neighbours are taken into account.We computed the mutual information between each pixel and its neighbours in a neighbouring of 11 × 11.For the experiment we used the data contained in the whole orbit, i.e., for the computation of the mutual information we used 94, 440 samples for each component and compute the average among all the components.Particular values are different for the three compression settings analyzed, however, the trend is the same for all of them.For low to high compression ratios (i.e.[30:1-500:1]) each coefficient contains more information about its neighbours than the original scenes.When the compression ratio is extremely high (i.e., higher than 500:1), the shared information decreases dramatically.This behavior is also easy to see in the detailed squares below the graphs.In these squares we show the spatial pattern of the mutual information.It is easy to see that closer coefficients are more related (as expected), and that only the global intensity of the relations changes (i.e., the pattern of the relations is similar for different compression ratios).
2) A Simple Way to Exploit Spectral/Spatial Relations: In the following we investigate the impact that specifically exploiting spectral and spatial information has in the retrievals.Elaborated models that take into account the dependence of the the spatial smoothness with the spectral channel [80], or avoid the neighbour pixels corrupted by clouds could be used.However, for simplicity we present results for the less complex way of computing the average between neighboring pixels.We choose here to simply convolve the data with two Gaussian filters, one over the spectral domain and one with spatial neighboring pixels.We define the σ se and the σ sa

Noise standard deviation
Compression ratio Fig. 7.For each compression setting, the Noise Standard Deviation is plotted (solid blue lines).In all the plots, the vertical axis represents the noise level (noise standard deviation of IASI raw data) in the reconstructed spectra and the horizontal axis represents the compression ratio.Noise estimation for uncompressed data (dashed green lines) is reported as well for comparison purposes.
parameters for the standard deviation of the Gaussian filter in the spectral and the spatial dimension, respectively.σ se and σ sa are defined in the image domain.In particular σ se refers to distance in the spectral dimension, σ se = 1 is equal to one channel distance, which corresponds to ∼ 0.5cm −1 .σ sa is defined regarding the distances between pixels, for instance σ sa = 1 is equal to the distance between one pixel and the neighbouring one, which corresponds to ∼ 0.5 • .Combinations of five different σ se and seven different σ sa are evaluated.The minimum values, σ se = 0.01 and σ sa = 0.01, correspond to a identity function, i.e., no spectral/spatial features are taken into account.
Note that the IASI data we are using is already apodised.
Apodization introduces spatial correlation, σ ap .So the total spatial correlation would be σ tot = σ ap + σ sa .Or equivalently we cannot start our experiment from zero correlation since the apodization process has already introduced some spatial correlation.
Table IV reports the retrieval performance for dew point temperature for different combinations of the parameters values.While exploiting both spectral and spatial features benefits the retrieval performance for LR, exploiting spectral features does not improve the results for KRR.In the case of LR, the performance is slightly improved when σ se is large and σ sa is small.However, for KRR the retrieval results are generally degraded when σ se is increased.Fig. 8. Spatial information shared for each compression approach and different compression ratios.Blue solid curve shows the average amount of mutual information (in number of bits) shared between one coefficient and its spatial neighbours.Green dashed line represents the same quantity for the uncompressed image.Below each plot, the detailed shared information between the central pixel and each neighbour is reported.In each square, the amount of black corresponds to the mutual information between this spatial position and the central pixel.Each square corresponds to a different compression ratio in the plot above, and are sorted left to right as the dots in the blue solid curve in the plot.It is clear that spatial features are more important than spectral features, as shown in [76] for hyperspectral data.It is interesting to note that, while the best prediction results for LR are yielded for large σ sa , the best choice for KRR is significantly smaller σ sa .Improvements for LR are more significant than for KRR.Note that even for this extremely simple averaging procedure, results confirm the suggestion reported in [32]: exploitation of spectral and spatial regularization (smoothing) improves the retrieval results.
3) Combining Compression Settings with the Specific Exploitation of Spectral/Spatial Relations: In this section we show how the benefits due to compression are drastically reduced when we have already taken into account spatial/spectral information when pre-processing the data.We will show that compression is an effective, consistent, and cheap way to exploit important spectral and spatial data relations and removing noise in the reconstructed spectra.However, it is done in an indirect way and the benefits are limited.
In order to analyze the performance of compression paired along with the exploitation of spectral/spatial feature relations, we analyze several configurations.On the one hand we present results for the already explored strategies: retrieval over the original data, retrieval over the compressed and recovered data, and retrievals over the data processed using the method presented in the previous section.On the other hand we follow two extra strategies: 1) first, the Gaussian filter is applied to the original data and then, the filtered data are compressed; 2) first, the original data are compressed and then, the reconstructed spectra are filtered.Figure 9 illustrates the sequential chain of the last two strategies.Several conclusions can be extracted from this experiment: 1) Pre-process the data to take into account the spatial and the spectral information about the neighbours compensates the effect of using recovered data from compression.While using recovered data (maroon curve) improves the results obtained for the original data (red curve), this improvement vanishes when spectral/spatial information is exploited, i.e., using a compression method over data that have been already pre-processed to take into account spectral/spatial information (blue line) does not improve the performance (orange line).This observation indicates that when spectral/spatial regularization is carried out before compression, the spectral/spatial transform applied in the compression stage is not able to exploit extra features.2) Exploiting feature relations achieves improved retrieval performance.As we have already seen in section IV-C2, exploiting extra features about the neighbouring pixels in the original spectra clearly benefits the prediction results.The consistency and homogeneity of the data is improved by pre-processing the data to take into ac-count local relations between neighboring pixels, which benefits the regression algorithms.The improvement is more significant for cloudy conditions than for cloud free conditions.3) LR benefits more significantly from spectral/spatial regularization than KRR.According to the results reported in the supplementary material at http://isp.uv.es/spatio_ spectral_compression.html, it is clear from the individual plots that, when spectral/spatial regularization is added, larger improvements in the retrieval performance are achieved for LR compared to KRR.Gains produced for LR are, approximately, twice the gains produced for KRR.4) Filtering + compression yields better results than compression + filtering.When the exploitation of features relations is paired along with compression, it is more efficient to perform first the spectral/spatial regularization and then compress the filtered data.When the compression stage is first carried out and the reconstructed spectra are filtered, the prediction performance is degraded as the compression ratio increases.The loss of performance is clear even at low compression ratios.5) Results are consistent for all the approaches and scenarios analyzed.Conclusions are similar for all the plots, which suggests that the observed behaviour is consistent for all the approaches, compression settings, and scenarios analyzed.Figure 12 illustrates the bias and the RMSE results for the range of pressure levels between 1100 and 100 hPa.The figure illustrates that the benefits of compression are compensated by pre-processing the data to take into account neighbouring spatial and spectral features.This supports the idea that one important advantage of using recovered compressed data for retrieval is that the process exploits information about the neighbours in the data.
Both approaches achieve the similar performance for all pressure levels and scenarios, which indicates that the benefits obtained from compression were mostly due to the indirect exploitaiton of spatial and spectral neighbouring pixels information.
V. DISCUSSION Lossy compression techniques aim at reducing the size of the transmitted/stored information keeping the most informative part of the signal.As a consequence, the data information becomes more compact, which helps the subsequent information extraction steps.The multi-component transforms employed in the compression stage in this paper are based either in the Karhunen-Loeve transform (KLT) or the wavelet transform.Both transformations have the effect of translating the data to a domain where the statistical relations (or correlations) are reduced [35], [81], [82].When compressing in these domains, the less relevant features are reduced, thus stressing the important ones.When transforming back to the original domain, the reconstructed data is a version of the original data where the less relevant features have also been reduced.From the signal processing point of view, it can be seen as a filtering process.Compression ratio Fig. 10.Dew point temperature (in kelvin) retrieval performance for different lossy compression settings using LR.In all the plots, the vertical axis represents the averaged RMSE over the different pressure levels and the horizontal axis represents the compression ratio.Ranges are the same in all the plots to ease the comparison.Each row shows the results for a particular compression setting and each column shows the results for a particular scenario.Each plot compares five different approaches, i.e., original data, compression of the original data, original data + Gaussian filter, Gaussian filter + compression, and compression + Gaussian filter.
As an example, when using KLT on images, the transformation becomes very similar to the Fourier transformation [83], i.e., the signal is decomposed in different frequencies.The high frequencies of the images have low magnitude in this domain, which in general translates into a poor signal to noise ratio -this serves to explain why most coding techniques aim at reducing the energy of these frequencies-.As a consequence, the reconstructed image has less energy in the high frequencies, which stresses the contribution of the low frequencies.
We investigated how much each of these aspects affects the performance of the statistically-based retrieval methods for sounder data.In particular, we analyzed the retrieval of physical variables (temperature and dew point temperature) from IASI L1C data.
We first verified that applying a compression process to remote sensing scenes can improve the performance of the statistically based retrieval methods.We analyzed different compression settings combined with different retrieval methods in a realistic scenario.This was illustrated in Figure 3 and Figure 5.While low compression ratios kept the results almost unchanged, moderate and high compression ratios generally enabled improved retrievals.However, when the compression ratios were very high, the retrieval performance was notably decreased, as expected.
Then we analyzed the first effect: how certain compression ratios help to remove useless information while keeping the relevant one.We analyzed the compression approaches as if they were denoising methods.When a signal is lossy compressed, the reconstructed (decompressed) signal has lost information.When this removed information is mostly noise and only a small amount of useful information is removed, compression has a positive impact on the regression results.We showed that for certain compression ratios and compression settings, most of the removed information was noise.Figure 7 reported the noise level present in the reconstructed spectra for different compression ratios and different compression settings.Plots illustrated that low compression ratios of approximately 8:1 removed small amounts of noise, specially in the case of POT + JPEG 2000 and Multilevel Clustering KLT + JPEG 2000.Moderate and high compression ratios were able to filter most of the noise present in the data.However, when extremely high compression ratios (i.e., above 500:1) were used in the compression stage, useful information was also removed and the quality of the reconstructed data was Compression ratio Fig. 11.Dew point temperature (in kelvin) retrieval performance for different lossy compression settings using KRR.In all the plots, the vertical axis represents the averaged RMSE over the different pressure levels and the horizontal axis represents the compression ratio.Ranges are the same in all the plots to ease the comparison.Each row shows the results for a particular compression setting and each column shows the results for a particular scenario.Each plot compares five different approaches, i.e., original data, compression of the original data, original data + Gaussian filter, Gaussian filter + compression, and compression + Gaussian filter.
Land  The bias of the each approach is plotted with thin and dash-dot lines for LR (red) and KRR (green).For the compression setting, only the compression ratio (CR) with the best average RMSE is reported.Results using the original data are also displayed for comparison purposes, LR (thick, dashed and red lines) and KRR (thick, dashed and green lines).
degraded.This explains why reconstructed spectra produce accurate retrieval performance when the original data have been compressed at moderate and high compression ratios (between 32:1 and 420:1, approximately, depending on the compression setting).When extremely high compression ratios were used, the removed signal was informative and therefore retrieval performance decreased.The last part of the paper was focused on the second important effect of the lossy compression techniques.We posit that the compaction of the information introduced by the compression has the effect of exploiting spatial and spectral information about the neighbours in the reconstructed signal.
First, we specifically computed the amount of information about the spatial neighbours is introduced when compressing the data.After that, we showed that exploiting certain spectral/spatial regularization on hyperspectral data had a positive effect in the retrievals (Table IV).In particular, the spatial relations were more important than the spectral relations.Then we evaluated the impact of compression in the retrievals when the data was already pre-processed to take into account spectral/spatial feature relations.Results in Fig. 10 and Fig. 11 showed that when the data was pre-processed the effect of compression in retrieval performance are compensated.In other words, an important advantage of using recovered compressed data for retrieval is that the process exploits information about the data neighbours.

VI. CONCLUSIONS
This paper studied the impact of lossy compression on statistically based regression algorithms for retrieving atmospheric profiles of temperature and dew point temperature using infrared sounder data (IASI L1C).We analyzed the reasons behind the benefits produced by compression and provided recommendations for effective prediction performance and data compression.
Several compression settings were evaluated and a linear and a nonlinear regression algorithm were assessed in a realistic training/testing scheme.Results for the prediction of two physical variables (temperature and dew point temperature) and four different scenarios (land and cloud free, land and cloudy, ocean and cloud free, and ocean and cloudy) were reported.
Lossy compression was carried out through different compression settings, always within the scope of international standard JPEG 2000.To achieve competitive compression performance in hyperspectral data with a large number of spectral components (such as IASI data), three spectral transforms were paired along with JPEG 2000, i.e., DWT, POT, and Multilevel Clustering KLT.In the experiments, nine compression ratios were analyzed to search for an optimal trade-off between retrieval performance and data compression.
Experimental results revealed that reconstructed spectra have enough quality to achieve competitive retrieval performance because noise filtering is carried out in the compression stage, which allows to achieve high compression ratios while retaining as much features as possible.Another positive effect arises from the ability of compression to exploit spectral and spatial feature relations in an indirect way, which benefits the retrieval methods (on average and using ECMWF analysis as ground truth).We observed that moderate-to-high compression ratios produced improved results in predicting temperature and dew point temperature in the different scenarios analyzed.As expected, when the compression ratio is extremely high, the benefits disappear because large amounts of useful information are removed from the data.
Results reported that exploiting spatial relations between neighboring pixels is more productive than exploiting spectral relations.While spectral regularization kept the results almost unchanged, spatial regularization improved the predictions by almost 20%.Spatial regularization is, hence, a key element to be exploited in pixelwise retrieval algorithms, where the spatial component is missing.
Experiments revealed that when the data is pre-processed to take into account the spectral and spatial feature relations before compression, the retrieval performance was improved compared to the results yielded by the original data even at high compression ratios.The compression setting Multilevel Clustering KLT + JPEG 2000 was able to significantly improve the atmospheric predictions at compression ratios higher than 200:1.
The proposed methodology can be applied to other retrieval methods and benefits the development of current and upcoming infrared sounding instruments.While current retrieval methods would benefit from efficiently compressed spectra, savings in data transmission and storage would involve operational improvements.
This study may have a deep impact on both currently flying infrared sounding instruments (e.g., IASI and CrIS ) or upcoming (e.g., MTG-IRS).

Fig. 2 .
Fig.2.Structure of a classical KLT, POT, and Multilevel Clustering KLT.This example decorrelates eight spectral components.Each arrow denotes a component and each coloured rectangle represents the computation of a KLT.In the case of POT, three levels of transform are applied, with a different number of clusters per level, and all clusters transforming two components.In the case of Multilevel Clustering KLT, two levels of multilevel clustering are applied and two clusters are defined in the first level.

DWTFig. 4
Fig.4.Index i D for different lossy compression settings using LR (solid red lines) and KRR (solid blue lines).In all the plots, the vertical axis represents the i D index and the horizontal axis represents the compression ratio.Ranges are the same in all the plots to ease the comparison.Results using uncompressed spectra (original data) for LR (dashed red lines) and KRR (dashed blue lines) are plotted for comparison purposes.

LandFig. 5 .
Fig.5.Dew point temperature (in kelvin) RMSE profiles and bias.The compression ratio (CR) with the best average RMSE is reported for LR (thick, solid and red lines) and KRR (thick, solid and green lines).Results using the original data are shown as well for comparison purposes when LR (thick, dashed and red lines) and KRR (thick, dashed and green lines) are used for the predictions.The bias of the reconstructed data are plotted with thin and dash-dot lines for LR (red) and KRR (green).

Fig. 6 .
Fig. 6.MAE for dew point temperature over the whole profile.Plots report MAE for uncompressed and reconstructed spectra and LR and KRR algorithms.Larger errors are represented by red color and smaller errors by blue color.CR reports the compression ratio achieved in the compression stage.DWT + JPEG 2000 POT + JPEG 2000 MLC KLT + JPEG 2000

Figure 10 and
Figure10and Fig.11illustrate the retrieval performance of dew point temperature for LR and KRR, respectively.Plots show the average of the RMSE prediction over the different pressure levels.A configuration that produces competitive performance for both LR and KRR is used in the filtering stage, i.e., σ se = 0.25 and σ sa = 10.Results are reported for

Fig. 9 .
Fig. 9. Adopted sequential chain when a spectral/spatial regularization stage and a coding process are carried out.

Fig. 12 .
Fig.12.Dew point temperature (in kelvin) RMSE profiles and bias.Two approaches are compared: original data + Gaussian filter, and Gaussian filter + Multilevel Clustering KLT + JPEG 2000.LR (thick, solid and red lines) and KRR (thick, solid and green lines).The bias of the each approach is plotted with thin and dash-dot lines for LR (red) and KRR (green).For the compression setting, only the compression ratio (CR) with the best average RMSE is reported.Results using the original data are also displayed for comparison purposes, LR (thick, dashed and red lines) and KRR (thick, dashed and green lines).

TABLE I IASI
L1C PRODUCTS USED IN THE EXPERIMENTS.IDENTIFIERS AND SIZES ARE PROVIDED.M IS THE NUMBER OF SPECTRAL COMPONENTS, NS IS THE NUMBER OF SCAN LINES, N-FORS IS THE NUMBER OF ELEMENTARY FIELDS OF REGARD (FOR) PER LINE, AND N-IFOVS IS THE NUMBER OF INSTANTANEOUS FIELDS OF VIEW (IFOV) PER FOR.

TABLE II CONFIGURATION
OF THE COMPRESSION SETTING USED IN THE EXPERIMENTS.
reports the number of spectra analyzed in each scenario.The compression settings POT + JPEG 2000 and Multilevel Clustering KLT + JPEG 2000 achieve lower maximum compression ratios compared to DWT + JPEG 2000 because side-information needs to be transmitted in addition to the compressed data.

TABLE IV RMSE
OF DEW POINT TEMPERATURE PREDICTIONS (IN KELVIN) FOR DIFFERENT COMBINATIONS OF σsa AND σse.THE IMPROVEMENT OVER THE ORIGINAL DATA IS REPORTED IN PERCENTAGE IN BRACKETS.BEST (GREEN) AND WORST (RED) PERFORMANCE ARE REPORTED.