如何在 R 中创建多项式模型?
大多数情况下,自变量与响应变量不存在线性关系,我们需要找到适合我们数据的最佳模型。在这种情况下,我们继续使用多项式模型,以检查它们是否有助于确定预测的准确性。这可以通过在 lm 函数中使用自变量的幂次来完成。
示例
考虑以下数据框架 −
> set.seed(99) > x1<-rnorm(30,0.5) > x2<-rpois(30,5) > x3<-runif(30,2,5) > x4<-rnorm(30,0.8) > y<-rpois(30,10) > df<-data.frame(x1,x2,x3,x4,y)
使用 2 次 x1 变量创建模型 −
> PolynomialModel1<-lm(y~x1+I(x1^2)+x2+x3+x4) > summary(PolynomialModel1) Call: lm(formula = y ~ x1 + I(x1^2) + x2 + x3 + x4) Residuals: Min 1Q Median 3Q Max -4.6890 -1.5544 -0.5614 1.6872 5.1347 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 12.72141 2.25262 5.647 8.16e-06 *** x1 0.61879 0.51927 1.192 0.245 I(x1^2) -0.45597 0.36046 -1.265 0.218 x2 -0.22389 0.25613 -0.874 0.391 x3 -0.05005 0.56085 -0.089 0.930 x4 -0.46588 0.67529 -0.690 0.497 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.499 on 24 degrees of freedom Multiple R-squared: 0.1979, Adjusted R-squared: 0.03079 F-statistic: 1.184 on 5 and 24 DF, p-value: 0.3461
使用 2 次和 3 次 x1 变量创建模型 −
> PolynomialModel2<-lm(y~x1+I(x1^2)+I(x1^3)+x2+x3+x4) > summary(PolynomialModel2) Call: lm(formula = y ~ x1 + I(x1^2) + I(x1^3) + x2 + x3 + x4) Residuals: Min 1Q Median 3Q Max -4.7600 -1.5965 -0.6293 1.6855 5.0326 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 12.69112 2.30315 5.510 1.33e-05 *** x1 0.39753 1.16291 0.342 0.736 I(x1^2) -0.40674 0.43399 -0.937 0.358 I(x1^3) 0.07242 0.33881 0.214 0.833 x2 -0.21837 0.26265 -0.831 0.414 x3 -0.01952 0.58989 -0.033 0.974 x4 -0.54635 0.78526 -0.696 0.494 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.55 on 23 degrees of freedom Multiple R-squared: 0.1995, Adjusted R-squared: -0.00935 F-statistic: 0.9552 on 6 and 23 DF, p-value: 0.4764
使用 2 次和 3 次 x1 变量,以及 2 次 x2 变量创建模型 −
> PolynomialModel3<-lm(y~x1+I(x1^2)+I(x1^3)+x2+I(x2^2)+x3+x4) > summary(PolynomialModel3) Call: lm(formula = y ~ x1 + I(x1^2) + I(x1^3) + x2 + I(x2^2) + x3 + x4) Residuals: Min 1Q Median 3Q Max -4.4688 -1.5123 -0.5659 1.5657 5.2208 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 13.26835 2.81745 4.709 0.000107 *** x1 0.44131 1.19123 0.370 0.714577 I(x1^2) -0.39980 0.44277 -0.903 0.376322 I(x1^3) 0.05274 0.34941 0.151 0.881391 x2 -0.67626 1.26441 -0.535 0.598124 I(x2^2) 0.05114 0.13801 0.371 0.714527 x3 0.03889 0.62160 0.063 0.950677 x4 -0.49947 0.81036 -0.616 0.543985 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.599 on 22 degrees of freedom Multiple R-squared: 0.2044, Adjusted R-squared: -0.04868 F-statistic: 0.8077 on 7 and 22 DF, p-value: 0.5901
使用 2 次和 3 次 x1 变量、2 次 x2 变量和 3 次 x4 变量创建模型 −
> PolynomialModel4<-lm(y~x1+I(x1^2)+I(x1^3)+x2+I(x2^2)+x3+I(x4^3)) > summary(PolynomialModel4) Call: lm(formula = y ~ x1 + I(x1^2) + I(x1^3) + x2 + I(x2^2) + x3 + I(x4^3)) Residuals: Min 1Q Median 3Q Max -4.1388 -1.5998 -0.4581 1.6871 5.2185 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 13.156294 2.829809 4.649 0.000124 *** x1 0.522760 1.160777 0.450 0.656862 I(x1^2) -0.440464 0.437798 -1.006 0.325310 I(x1^3) 0.014329 0.329379 0.044 0.965692 x2 -0.658946 1.277395 -0.516 0.611104 I(x2^2) 0.048228 0.139822 0.345 0.733428 x3 0.002062 0.613597 0.003 0.997349 I(x4^3) -0.104330 0.192868 -0.541 0.593985 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.604 on 22 degrees of freedom Multiple R-squared: 0.2013, Adjusted R-squared: -0.05279 F-statistic: 0.7923 on 7 and 22 DF, p-value: 0.6016
广告