如何在包含不到三个重复项的分类列中删除行,以便在 R 数据帧中包含这些行?


在数据分析中,我们有时根据自己的想法决定数据或样本的大小,这可能会导致删除部分数据。其中一项可能是在分类列中删除少于三项重复项,而这可以在 dplyr 软件包的 filter 函数的帮助下完成,方法是使用 group_by 函数对其分组。

示例 1

 实际演示

考虑以下数据帧 -

set.seed(121)
x1<−sample(LETTERS[1:6],20,replace=TRUE)
x2<−sample(c("Male","Female"),20,replace=TRUE)
x3<−rpois(20,5)
df1<−data.frame(x1,x2,x3)
df1

输出

x1 x2 x3
1 D Female 5
2 D Female 2
3 D Male 7
4 D Female 8
5 A Male 6
6 C Female 7
7 A Female 3
8 C Female 1
9 C Female 7
10 E Male 2
11 D Female 3
12 E Female 6
13 F Female 3
14 D Female 4
15 A Male 4
16 E Male 4
17 B Female 8
18 B Female 7
19 C Female 5
20 A Female 9

加载 dplyr 软件包并删除组合重复项少于三项的分类列 -

示例

library(dplyr)
df1%>%group_by(x1,x2)%>%filter(n()>=4)
# A tibble: 9 x 3
# Groups: x1, x2 [2]

输出

x1 x2 x3
<chr> <chr> <int>
1 D Female 5
2 D Female 2
3 D Female 8
4 C Female 7
5 C Female 1
6 C Female 7
7 D Female 3
8 D Female 4
9 C Female 5

示例 2

 实际演示

y1<−sample(c("S1","S2","S3","S4","S5","S6"),20,replace=TRUE)
y2<−sample(c("Winter","Summer"),20,replace=TRUE)
y3<−rnorm(20,3)
df2<−data.frame(y1,y2,y3)
df2

输出

y1 y2 y3
1 S1 Winter 2.683082
2 S4 Summer 1.141916
3 S6 Winter 3.371681
4 S2 Winter 3.191187
5 S3 Summer 2.195504
6 S5 Summer 2.631736
7 S3 Winter 3.303605
8 S6 Summer 3.074344
9 S5 Summer 2.663724
10 S5 Winter 2.281991
11 S6 Summer 4.174418
12 S4 Winter 6.081246
13 S4 Summer 3.202913
14 S2 Winter 5.557243
15 S2 Winter 3.747462
16 S2 Winter 2.621571
17 S2 Summer 3.909743
18 S5 Winter 2.325663
19 S5 Summer 3.749852
20 S5 Winter 2.331191

示例

df2%>%group_by(y1,y2)%>%filter(n()>=4)
# A tibble: 4 x 3
# Groups: y1, y2 [1]

输出

y1 y2 y3
<chr> <chr> <dbl>
1 S2 Winter 3.19
2 S2 Winter 5.56
3 S2 Winter 3.75
4 S2 Winter 2.62

更新于:2021 年 2 月 8 日

318 次查看

开启您的 事业

完成课程即可获得认证

开始
广告