如何在 R 中创建多项式模型?


大多数情况下,自变量与响应变量不存在线性关系,我们需要找到适合我们数据的最佳模型。在这种情况下,我们继续使用多项式模型,以检查它们是否有助于确定预测的准确性。这可以通过在 lm 函数中使用自变量的幂次来完成。

示例

考虑以下数据框架 −

> set.seed(99)
> x1<-rnorm(30,0.5)
> x2<-rpois(30,5)
> x3<-runif(30,2,5)
> x4<-rnorm(30,0.8)
> y<-rpois(30,10)
> df<-data.frame(x1,x2,x3,x4,y)

使用 2 次 x1 变量创建模型 −

> PolynomialModel1<-lm(y~x1+I(x1^2)+x2+x3+x4)
> summary(PolynomialModel1)
Call:
lm(formula = y ~ x1 + I(x1^2) + x2 + x3 + x4)
Residuals:
Min 1Q Median 3Q Max
-4.6890 -1.5544 -0.5614 1.6872 5.1347
Coefficients:
         Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.72141    2.25262 5.647 8.16e-06 ***
x1       0.61879 0.51927  1.192 0.245
I(x1^2) -0.45597 0.36046 -1.265 0.218
x2      -0.22389 0.25613 -0.874 0.391
x3      -0.05005 0.56085 -0.089 0.930
x4      -0.46588 0.67529 -0.690 0.497
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.499 on 24 degrees of freedom
Multiple R-squared: 0.1979, Adjusted R-squared: 0.03079
F-statistic: 1.184 on 5 and 24 DF, p-value: 0.3461

使用 2 次和 3 次 x1 变量创建模型 −

> PolynomialModel2<-lm(y~x1+I(x1^2)+I(x1^3)+x2+x3+x4)
> summary(PolynomialModel2)
Call:
lm(formula = y ~ x1 + I(x1^2) + I(x1^3) + x2 + x3 + x4)
Residuals:
Min 1Q Median 3Q Max
-4.7600 -1.5965 -0.6293 1.6855 5.0326
Coefficients:
        Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.69112 2.30315 5.510 1.33e-05 ***
x1       0.39753 1.16291  0.342 0.736
I(x1^2) -0.40674 0.43399 -0.937 0.358
I(x1^3)  0.07242 0.33881  0.214 0.833
x2      -0.21837 0.26265 -0.831 0.414
x3      -0.01952 0.58989 -0.033 0.974
x4      -0.54635 0.78526 -0.696 0.494
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.55 on 23 degrees of freedom
Multiple R-squared: 0.1995, Adjusted R-squared: -0.00935
F-statistic: 0.9552 on 6 and 23 DF, p-value: 0.4764

使用 2 次和 3 次 x1 变量,以及 2 次 x2 变量创建模型 −

> PolynomialModel3<-lm(y~x1+I(x1^2)+I(x1^3)+x2+I(x2^2)+x3+x4)
> summary(PolynomialModel3)
Call:
lm(formula = y ~ x1 + I(x1^2) + I(x1^3) + x2 + I(x2^2) + x3 +
x4)
Residuals:
Min 1Q Median 3Q Max
-4.4688 -1.5123 -0.5659 1.5657 5.2208
Coefficients:
     Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.26835 2.81745 4.709 0.000107 ***
x1       0.44131 1.19123  0.370 0.714577
I(x1^2) -0.39980 0.44277 -0.903 0.376322
I(x1^3)  0.05274 0.34941  0.151 0.881391
x2      -0.67626 1.26441 -0.535 0.598124
I(x2^2)  0.05114 0.13801  0.371 0.714527
x3       0.03889 0.62160  0.063 0.950677
x4      -0.49947 0.81036 -0.616 0.543985
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.599 on 22 degrees of freedom
Multiple R-squared: 0.2044, Adjusted R-squared: -0.04868
F-statistic: 0.8077 on 7 and 22 DF, p-value: 0.5901

使用 2 次和 3 次 x1 变量、2 次 x2 变量和 3 次 x4 变量创建模型 −

> PolynomialModel4<-lm(y~x1+I(x1^2)+I(x1^3)+x2+I(x2^2)+x3+I(x4^3))
> summary(PolynomialModel4)
Call:
lm(formula = y ~ x1 + I(x1^2) + I(x1^3) + x2 + I(x2^2) + x3 +
I(x4^3))
Residuals:
Min 1Q Median 3Q Max
-4.1388 -1.5998 -0.4581 1.6871 5.2185
Coefficients:
        Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.156294 2.829809 4.649 0.000124 ***
x1       0.522760 1.160777  0.450 0.656862
I(x1^2) -0.440464 0.437798 -1.006 0.325310
I(x1^3)  0.014329 0.329379  0.044 0.965692
x2      -0.658946 1.277395 -0.516 0.611104
I(x2^2)  0.048228 0.139822  0.345 0.733428
x3       0.002062 0.613597  0.003 0.997349
I(x4^3) -0.104330 0.192868 -0.541 0.593985
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.604 on 22 degrees of freedom
Multiple R-squared: 0.2013, Adjusted R-squared: -0.05279
F-statistic: 0.7923 on 7 and 22 DF, p-value: 0.6016

更新于: 10-Aug-2020

199 次浏览

开启您的 职业生涯

完成课程认证

开始
广告