如果R数据框中也存在分类列,如何仅标准化数值列?
数值列的标准化可以通过scale函数轻松完成,但如果我们想要标准化数据框的多个列(如果也存在分类列),则将使用dplyr包的mutate_if函数。例如,如果我们有一个数据框df,则可以将其执行为df%>%mutate_if(is.numeric,scale)
示例1
考虑以下数据框 -
> x1<-sample(letters[1:4],20,replace=TRUE) > x2<-rpois(20,2) > df1<-data.frame(x1,x2) > df1
输出
x1 x2 1 c 4 2 c 1 3 a 4 4 a 1 5 b 0 6 c 4 7 c 2 8 a 1 9 c 2 10 d 2 11 b 0 12 b 3 13 c 0 14 d 1 15 a 2 16 d 1 17 a 2 18 d 2 19 c 1 20 a 3
加载dplyr包并在df1中标准化数值列 -
> library(dplyr) > df1%>%mutate_if(is.numeric,scale)
输出
x1 x2 1 c 1.7168098 2 c -0.6242945 3 a 1.7168098 4 a -0.6242945 5 b -1.4046626 6 c 1.7168098 7 c 0.1560736 8 a -0.6242945 9 c 0.1560736 10 d 0.1560736 11 b -1.4046626 12 b 0.9364417 13 c -1.4046626 14 d -0.6242945 15 a 0.1560736 16 d -0.6242945 17 a 0.1560736 18 d 0.1560736 19 c -0.6242945 20 a 0.9364417
示例2
> y1<-sample(c("S1","S2","S3"),20,replace=TRUE) > y2<-rnorm(20,34,2.3) > y3<-rnorm(20,500,47.1) > df2<-data.frame(y1,y2,y3) > df2
输出
y1 y2 y3 1 S2 33.67237 511.9535 2 S2 30.47941 509.6286 3 S3 35.19967 605.8329 4 S2 27.82392 590.1114 5 S2 33.91328 485.1736 6 S1 38.26157 449.6714 7 S3 32.46148 495.2131 8 S3 32.06987 477.6192 9 S2 33.32162 448.6335 10 S2 37.55487 544.3631 11 S2 34.84706 462.9035 12 S1 34.59332 532.0554 13 S2 32.36337 501.9207 14 S2 32.26520 516.7858 15 S3 33.62168 530.5313 16 S3 33.06213 515.0878 17 S1 35.09752 454.7614 18 S3 31.79898 499.8527 19 S1 32.85342 509.8768 20 S3 33.72336 503.8084
在df2中标准化数值列 -
> df2%>%mutate_if(is.numeric,scale)
输出
y1 y2 y3 1 S2 0.09796633 0.11297890 2 S2 -1.30368623 0.05666468 3 S3 0.76842187 2.38692048 4 S2 -2.46939699 2.00611458 5 S2 0.20372057 -0.53568372 6 S1 2.11253906 -1.39561547 7 S3 -0.43359265 -0.29250727 8 S3 -0.60550146 -0.71866529 9 S2 -0.05600808 -1.42075459 10 S2 1.80231017 0.89800290 11 S2 0.61363310 -1.07510811 12 S1 0.50224659 0.59988493 13 S2 -0.47666141 -0.13003510 14 S2 -0.51975777 0.23002594 15 S3 0.07571152 0.56296787 16 S3 -0.16991946 0.18889687 17 S1 0.72358127 -1.27232444 18 S3 -0.72441871 -0.18012673 19 S1 -0.26153720 0.06267550 20 S3 0.12034948 -0.08431193
广告