如何使用 R 重新编码因子?
有时我们有一些可以合并因子等级或希望将这些等级分组为一个等级的情况。这种情况大多数出现在我们为特定因子等级只具有一个值或存在一些理论概念导致因子等级合并的场景中。例如,如果我们有一个名为 df 的数据帧,它包含一个因子列 x,有 A、B、C、D 四个类别,那么它们可以被分组为 A 和 B,如下所示 −
df$x[df$x %in% c("A","B")]<-"A" df$x[df$x %in% c("C","D")]<-"B"
示例
考虑以下数据帧 −
factor<-sample(LETTERS[1:4],20,replace=TRUE) response<-rpois(20,5) df1<-data.frame(factor,response) df1
输出
factor response 1 A 5 2 C 7 3 D 5 4 C 13 5 C 5 6 C 4 7 B 4 8 B 10 9 C 4 10 D 6 11 B 5 12 B 3 13 A 7 14 A 2 15 A 2 16 D 3 17 B 1 18 C 5 19 D 6 20 D 4
重新编码 df1 中因子列的因子等级 −
df1$factor[df1$factor %in% c("A","B")]<-"A" df1$factor[df1$factor %in% c("C","D")]<-"B" df1
输出
factor response 1 A 5 2 B 7 3 B 5 4 B 13 5 B 5 6 B 4 7 A 4 8 A 10 9 B 4 10 B 6 11 A 5 12 A 3 13 A 7 14 A 2 15 A 2 16 B 3 17 A 1 18 B 5 19 B 6 20 B 4
Explore our latest online courses and learn new skills at your own pace. Enroll and become a certified expert to boost your career.
示例 2
grp<-sample(c("G1","G2","G3"),20,replace=TRUE) Y<-rnorm(20) df2<-data.frame(grp,Y) df2
输出
grp Y 1 G3 -0.39900138 2 G3 1.04085657 3 G1 1.46432790 4 G3 -0.90843955 5 G1 -0.15202516 6 G2 1.15456629 7 G2 1.24002828 8 G2 -1.10731484 9 G2 0.27423208 10 G3 1.06444903 11 G2 -0.21824650 12 G1 0.25843090 13 G1 0.07686889 14 G3 -0.21955611 15 G3 -0.05359245 16 G2 0.54630987 17 G3 -0.09808820 18 G1 -0.65171471 19 G2 -0.62371231 20 G2 -0.03319190
重新编码 df2 中 grp 列的因子等级 −
df2$grp[df2$grp %in% c("G1","G2")]<-"Control" df2
grp Y 1 G3 -0.39900138 2 G3 1.04085657 3 Control 1.46432790 4 G3 -0.90843955 5 Control -0.15202516 6 Control 1.15456629 7 Control 1.24002828 8 Control -1.10731484 9 Control 0.27423208 10 G3 1.06444903 11 Control -0.21824650 12 Control 0.25843090 13 Control 0.07686889 14 G3 -0.21955611 15 G3 -0.05359245 16 Control 0.54630987 17 G3 -0.09808820 18 Control -0.65171471 19 Control -0.62371231 20 Control -0.03319190
广告