Pair Regression in Microsoft Excel by using of P-Splines
Authors: Anikin V.I., Anikina O.V., Gushchina O.M. | Published: 29.09.2017 |
Published in issue: #5(116)/2017 | |
DOI: 10.18698/0236-3933-2017-5-114-131 | |
Category: Informatics, Computer Engineering and Control | Chapter: Mathematical Modelling, Numerical Methods, and Program Complexes | |
Keywords: Microsoft Excel, spreadsheet model, regression analysis, semiparametric regression, P-spline, B-spline |
The study theoretically shows the possibility of building recursive P-spline regression models based on the truncated polynomial P-splines. Without writing VBA code according to the principle of "programming without programming", we created general and recursive models of semi-parametric regression in Excel using polynomial P-splines of the first, second and third orders, having a dramatic simplicity in the spreadsheet structure. We optimized the parameters of the models by the generalized reduced gradient method using the Excel Solver tool. The simulation experiments confirmed high quality of the regression and computational efficiency of the spreadsheet models. The regression quality was evaluated by statistical analysis of the residuals. For each spreadsheet regression model we calculated the coefficient of determination, the unbiased standard deviation of the residuals, the standard deviation of the parameters of the basis functions, the best value of smoothing parameter and the Durbin - Watson's autocorrelation coefficient. The advantages of the general (non-recursive) P-spline spreadsheet models are their versatility, the ability to use all the basic functions, in particular, B-splines, while their disadvantages are a large amount of computations in the optimization process and a bulky spreadsheet model. The recursive P-spline spreadsheet models have such advantages as much smaller amount of computation required in the optimization process and a very simple structure of the spreadsheet model, while the disadvantage is that only polynomial functions can be used as the basis of P-splines. The proposed methods for creating the spreadsheet P-spline models significantly expand capabilities of Excel as a simple and effective tool for smoothing and regression analysis of random data samples
References
[1] Klasson K.T. Construction of spline functions in spreadsheets to smooth experimental data. Advances in Engineering Software, 2008, vol. 39, iss. 5, pp. 422–429. DOI: 10.1016/j.advengsoft.2007.03.006
[2] Eilers P.H.C., Marx B.D., Durban M. Twenty years of P-splines. SORT, 2015, vol. 39, no. 2, pp. 149–186.
[3] Ruppert D., Wand M.P., Carroll R.J. Semiparametric regression during 2003–2007. Electronic Journal of Statistics, 2009, no. 3, pp. 1193–1256. DOI: 10.1214/09-EJS525 Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2841361
[4] Ruppert D., Wand M.P., Carroll R.J. Semiparametric regression. New York, Cambridge Univ. Press, 2003. 404 p.
[5] Jang J.S.R., Sun C.T., Mizutani E. Neuro-fuzzy and soft computing: A computational approach to learning and machine intelligence. New York, Prentice-Hall, 1997. 614 p.
[6] Van de Geer S.A. Least squares estimation. Encyclopedia of Statistics in Behavioral Science. Vol. 2. Wiley, 2005, pp. 1041–1045.
[7] Griggs W. Penalized spline regression and its applications. 2013. 51 p. Available at: https://www.whitman.edu/Documents/Academics/Mathematics/Griggs.pdf (accessed: 16.09.2016).
[8] Anikin V.I., Anikina O.V. Effective technology of making table models in Excel. Informatsionnye tekhnologii [Information Technologies], 2008, no. 10, pp. 74–77 (in Russ.).
[9] Anikin V.I., Anikina O.V., Zibrov P.F. Informatsionnye tekhnologii imitatsionnogo modelirovaniya [Informational technologies of imitational simulation]. Formirovanie sovremennogo informatsionnogo obshchestva — problemy, perspektivy, innovatsionnye podkhody. Materialy mezhdunarodnogo foruma [Forming modern informational society — problems, prospects, innovative approaches. Proc. int. forum]. Saint-Petersburg, GOU VPO SPbGUAP Publ., 2011, pp. 181–189 (in Russ.).
[10] Kutner M.H., Nachtsheim C.J., Neter J., Li W. Applied linear statistical models. New York, McGraw-Hill, 2004. 1396 p.