如何在 R 数据框中选取类别数少于四个的列?
如果列是分类变量,那么至少有两个类别,并且类别总数没有限制,但这也会取决于案例总数。如果我们有一个数据框包含一些分类列,其类别数量多于或少于 4 个,那么我们可能希望选取类别数少于四个的列。当我们想要有偏见地选取数据或具有一些允许这种更改的预定义数据特征时,这可能是必需的。可以使用 `sapply` 函数选取此类列的子集,如下面的示例所示。
示例 1
考虑以下数据框:
> x1<-sample(c("Hot","Cold","Warm"),20,replace=TRUE) > x2<-sample(c("Male","Female"),20,replace=TRUE) > x3<-sample(letters[1:4],20,replace=TRUE) > df1<-data.frame(x1,x2,x3) > df1
输出
x1 x2 x3 1 Warm Male b 2 Cold Female c 3 Cold Male a 4 Hot Male d 5 Hot Male d 6 Hot Female a 7 Hot Male a 8 Cold Female d 9 Warm Male d 10 Warm Female d 11 Cold Male a 12 Cold Female c 13 Hot Male b 14 Warm Male c 15 Cold Male b 16 Warm Male a 17 Hot Male b 18 Cold Male b 19 Hot Female c 20 Warm Female d
查找 df1 中类别数少于 4 的列的子集:
> df1[,sapply(df1, function(col) length(unique(col)))<4]
输出
x1 x2 1 Warm Male 2 Cold Female 3 Cold Male 4 Hot Male 5 Hot Male 6 Hot Female 7 Hot Male 8 Cold Female 9 Warm Male 10 Warm Female 11 Cold Male 12 Cold Female 13 Hot Male 14 Warm Male 15 Cold Male 16 Warm Male 17 Hot Male 18 Cold Male 19 Hot Female 20 Warm Female
示例 2
> y1<-sample(c("Male","Female"),20,replace=TRUE) > y2<-sample(letters[1:5],20,replace=TRUE) > y3<-sample(c("Asian","American","Chinese"),20,replace=TRUE) > df2<-data.frame(y1,y2,y3) > df2
输出
y1 y2 y3 1 Male b Chinese 2 Female b American 3 Female d Asian 4 Female e American 5 Female e Asian 6 Female c Chinese 7 Female a Chinese 8 Female a Chinese 9 Male d American 10 Female d Chinese 11 Female d Chinese 12 Female c American 13 Female b American 14 Male d Chinese 15 Male a American 16 Male e Asian 17 Male b Asian 18 Female d Chinese 19 Female d Chinese 20 Female c Asian
查找 df2 中类别数少于 4 的列的子集:
> df2[,sapply(df2, function(col) length(unique(col)))<4]
输出
y1 y3 1 Male Chinese 2 Female American 3 Female Asian 4 Female American 5 Female Asian 6 Female Chinese 7 Female Chinese 8 Female Chinese 9 Male American 10 Female Chinese 11 Female Chinese 12 Female American 13 Female American 14 Male Chinese 15 Male American 16 Male Asian 17 Male Asian 18 Female Chinese 19 Female Chinese 20 Female Asian
广告