最小角回归算法(LAR)结合采样误差分布分析(SEPA)建立稳健的近红外光谱分析模型 |
点此下载全文 |
|
基金项目:上海市科委科研计划项目(17142201100) |
|
中文摘要:结合采样误差分布分析(SEPA)框架和最小角回归(LAR)算法,提出了一种SEPA-LAR变量逐步筛选方法用于波长选择,并建立了稳健的近红外光谱分析模型。利用蒙特卡洛采样(MCS)获得多个数据集划分建立多个模型,对光谱各变量(波长)在所有模型的LAR系数进行统计分析,按其回归系数绝对值总和由大到小排序,选择排序靠前的波长建立偏最小二乘(PLS)模型,以未参与SEPA-LAR和建模的独立验证集对该模型进行评价。将玉米湿度、柴油密度以及奶酪脂肪的近红外光谱数据用于SEPA-LAR的性能检验,独立验证集的预测均方根误差(RMSEP)分别为0001 44%(湿度指标)、0001 58 g/mL(密度指标)以及113 g/100 g(脂肪含量指标)。结果表明,相较于竞争自适应重加权采样法(CARS),该方法具有更优异的稳定性;相较于移动窗口偏最小二乘(MWPLS)以及蒙特卡洛无信息变量消除(MCUVE)方法,该方法选择的变量更少,预测误差更低,预测性、可解释性和稳定性更优异。 |
中文关键词:最小角回归 回归系数 蒙特卡洛采样 采样误差分布分析 变量选择 近红外光谱 |
|
A Robust Near Infrared Modeling by Least Angel Regression and Sampling Error Profile Analysis |
|
|
Abstract:A novel variable selection method based on sampling error profile analysis frame and least angel regression(SEPA-LAR) was proposed in order to build a robust NIR model.Based on SEPA-LAR,more models were obtained by Monte Carlo sampling(MCS),and the LAR regression coefficients at each wavelength were statistically analyzed,which were sorted by the sum sequence of their absolute values.Wavelengths containing larger sums of the absolute values of regression coefficients were selected,and a model with the wavelengths was built.Samples in the independent validation dataset were applied in the evaluation of the model.NIR datasets of corn moisture,diesel density and cheese fat were used to evaluate the performance of SEPA-LAR.Errors of root mean squared error of prediction(RMSEP) estimated with the validation dataset are 0001 44%(moisture),0001 58 g/mL(density) and 113 g/100 g(fat content),respectively.The results showed that,compared with Monte Carlo uninformative variable elimination(MCUVE),moving window partial least squares regression(MWPLS) and competitive adaptive reweighted sampling(CARS),SEPA-LAR could select less wavelengths and has smaller prediction error.The calibration model built by SEPA-LAR has good predictive ability,stability and interpretability. |
Key Words:least angle regression regression coefficient Monte Carlo sampling sampling error profile analysis variable selection near infrared spectroscopy |
引用本文:熊芩,张若秋,李辉,陈万超,杜一平.最小角回归算法(LAR)结合采样误差分布分析(SEPA)建立稳健的近红外光谱分析模型[J].分析测试学报,2018,37(7):778-783. |
摘要点击次数: 2191 |
全文下载次数: 1099 |
查看全文 下载PDF阅读器 |
|
|
|
|