如何在 R 中查找存在数字和非数字列的数据框的相关性?
要查找存在数字和非数字列的数据框的相关性,我们可以使用带有 sapply 的 cor 函数并为 pearson 方法使用 complete.obs。例如,如果我们有一个名为的数据框,则可以使用以下命令查找相关系数 −
cor(df[,sapply(df,is.numeric)],use="complete.obs",method="pearson")
示例 1
考虑以下数据框 −
> x1<-sample(LETTERS[1:4],20,replace=TRUE) > x2<-rpois(20,5) > x3<-rpois(20,1) > df1<-data.frame(x1,x2,x3) > df1
输出
x1 x2 x3 1 C 11 2 2 A 3 1 3 C 4 0 4 D 10 2 5 A 1 0 6 A 4 1 7 D 4 0 8 B 2 0 9 C 6 1 10 C 4 2 11 A 7 1 12 C 5 0 13 B 5 0 14 D 5 2 15 C 8 1 16 A 7 0 17 B 2 0 18 B 5 0 19 B 4 2 20 A 8 1
查找 df1 的数值列之间的相关性 −
> cor(df1[,sapply(df1,is.numeric)],use="complete.obs",method="pearson")
输出
x2 x3 x2 1.0000000 0.4832695 x3 0.4832695 1.0000000
示例 2
> y1<-rnorm(20) > y2<-rnorm(20) > y3<-sample(c("Hot","Cold"),20,replace=TRUE) > y4<-sample(c("Male","Female"),20,replace=TRUE) > y5<-rpois(20,2) > df2<-data.frame(y1,y2,y3,y4,y5) > df2
输出
y1 y2 y3 y4 y5 1 1.51725168 -0.52762451 Cold Male 3 2 0.84772773 -0.43382197 Hot Female 2 3 -1.73640048 0.74754602 Cold Female 2 4 0.72972822 -0.07814968 Hot Male 1 5 1.69906347 0.56659629 Hot Male 1 6 -0.01761764 0.13790528 Hot Male 5 7 -2.06662444 0.84961541 Cold Male 2 8 -1.09416818 0.90565331 Hot Female 3 9 -1.33657153 0.80483709 Hot Male 1 10 1.97558526 1.24105635 Cold Female 0 11 -0.21074711 0.13355731 Hot Female 2 12 1.02177951 -0.59891452 Cold Female 4 13 1.73358364 0.11105171 Cold Male 1 14 0.37426668 0.68837549 Hot Male 1 15 1.74025264 -0.15972807 Hot Female 0 16 0.30275475 0.20629397 Cold Female 1 17 -0.28661576 1.01552432 Hot Male 3 18 -0.42663944 -1.30746381 Hot Male 3 19 -0.23888520 1.36409027 Cold Female 1 20 0.32587990 0.38175578 Cold Male 0
查找 df2 的数值列之间的相关性 −
> cor(df2[,sapply(df2,is.numeric)],use="complete.obs",method="pearson")
输出
y1 y2 y5 y1 1.0000000 -0.3038048 -0.2803100 y2 -0.3038048 1.0000000 -0.3424033 y5 -0.2803100 -0.3424033 1.0000000
广告