الفهرس | Only 14 pages are availabe for public view |
Abstract The interest in optimizing chemometrics techniques towards more robustness and scalabil-ity is increasing recently due to the widespread use of handheld spectrometers. Despite the ease of use and reduced cost introduced by miniaturized spectrometer sensors, it comes at the cost of increased modelling challenges. Model development must accommodate data drifts, sensors variations and environmental conditions. In this thesis, we study the challenges of spectral data modelling including variations in samples physical surface and multicollinearity of spectral features. Other challenges include selecting the wavelength regions that hold the features of interest. Outlier detection is a crucial process to obtain robust highly performing models. It is also essential to compensate for sensors variations in model building to increase model scalability. We propose an automated modelling pipeline that performs data preparation, pre-processing selection, variable selection, data compression and model optimization in an au-tomated fashion. Automation guarantees model optimization by adjusting the modelling parameters to best fit the application. A novel wavelengths selection technique is proposed to enhance model regularization and robustness. A semi-supervised deep learning framework is developed to process large datasets yielding a more scalable model to fit a wider sensor and sample space. Unsupervised data collected after initial models’ deployment is used in augmenting the calibration data increasing varia-tions coverage. An autoencoder is implemented to run in a multitask neural network archi-tecture along with the regression model of interest. The generated model from the semi-supervised framework is outperforming models built on the supervised dataset solely. To validate the proposed techniques and processes, seven datasets of different materials are used. Studies include gas, liquid, and solid datasets of homogenous and heterogenous nature. Multi-sensor datasets are measured, and different model transfer, customization and generalization methods are evaluated. |