如何在 R 数据框中选取类别数少于四个的列?


如果列是分类变量,那么至少有两个类别,并且类别总数没有限制,但这也会取决于案例总数。如果我们有一个数据框包含一些分类列,其类别数量多于或少于 4 个,那么我们可能希望选取类别数少于四个的列。当我们想要有偏见地选取数据或具有一些允许这种更改的预定义数据特征时,这可能是必需的。可以使用 `sapply` 函数选取此类列的子集,如下面的示例所示。

示例 1

考虑以下数据框:

在线演示

> x1<-sample(c("Hot","Cold","Warm"),20,replace=TRUE)
> x2<-sample(c("Male","Female"),20,replace=TRUE)
> x3<-sample(letters[1:4],20,replace=TRUE)
> df1<-data.frame(x1,x2,x3)
> df1

输出

   x1       x2  x3
1  Warm   Male  b
2  Cold Female  c
3  Cold   Male  a
4  Hot    Male  d
5  Hot    Male  d
6  Hot  Female  a
7  Hot    Male  a
8  Cold Female  d
9  Warm   Male  d
10 Warm Female  d
11 Cold   Male  a
12 Cold Female  c
13 Hot    Male  b
14 Warm   Male  c
15 Cold   Male  b
16 Warm   Male  a
17 Hot    Male  b
18 Cold   Male  b
19 Hot  Female  c
20 Warm Female  d

查找 df1 中类别数少于 4 的列的子集:

> df1[,sapply(df1, function(col) length(unique(col)))<4]

输出

    x1    x2
1  Warm   Male
2  Cold Female
3  Cold   Male
4  Hot    Male
5  Hot    Male
6  Hot  Female
7  Hot    Male
8  Cold Female
9  Warm   Male
10 Warm Female
11 Cold   Male
12 Cold Female
13 Hot    Male
14 Warm   Male
15 Cold   Male
16 Warm   Male
17 Hot    Male
18 Cold   Male
19 Hot  Female
20 Warm Female

示例 2

在线演示

> y1<-sample(c("Male","Female"),20,replace=TRUE)
> y2<-sample(letters[1:5],20,replace=TRUE)
> y3<-sample(c("Asian","American","Chinese"),20,replace=TRUE)
> df2<-data.frame(y1,y2,y3)
> df2

输出

     y1   y2    y3
1   Male  b  Chinese
2  Female b  American
3  Female d  Asian
4  Female e  American
5  Female e  Asian
6  Female c  Chinese
7  Female a  Chinese
8  Female a  Chinese
9   Male  d  American
10 Female d  Chinese
11 Female d  Chinese
12 Female c  American
13 Female b  American
14   Male d  Chinese
15   Male a  American
16   Male e  Asian
17   Male b  Asian
18 Female d  Chinese
19 Female d  Chinese
20 Female c  Asian

查找 df2 中类别数少于 4 的列的子集:

> df2[,sapply(df2, function(col) length(unique(col)))<4]

输出

    y1      y3
1   Male  Chinese
2  Female American
3  Female Asian
4  Female American
5  Female Asian
6  Female Chinese
7  Female Chinese
8  Female Chinese
9    Male American
10 Female Chinese
11 Female Chinese
12 Female American
13 Female American
14   Male Chinese
15   Male American
16   Male Asian
17   Male Asian
18 Female Chinese
19 Female Chinese
20 Female Asian

更新于:2021年3月5日

浏览量:99

启动你的职业生涯

完成课程获得认证

开始学习
广告