如何在R的dplyr包中使用列索引代替列名进行group_by操作?
当我们使用dplyr包的group_by函数时,需要传入具有类别性质的列名。如果要使用同一列的索引,则需要使用group_by_at函数,其中可以传入列索引作为参数。
示例1
考虑以下数据框:
x1<−sample(LETTERS[1:4],20,replace=TRUE) x2<−rpois(20,2) df1<−data.frame(x1,x2) df1
输出
x1 x2 1 D 4 2 D 5 3 B 2 4 D 3 5 C 1 6 C 3 7 D 1 8 D 3 9 B 3 10 B 2 11 C 0 12 C 1 13 A 2 14 B 2 15 B 2 16 C 4 17 D 2 18 A 0 19 D 0 20 B 2
加载dplyr包并使用列索引代替列名:
示例
library(dplyr) df1%>%group_by_at(1)%>%summarise(n=n()) `summarise()` ungrouping output (override with `.groups` argument)
输出
# A tibble: 4 x 2 x1 n < chr> <int> 1 A 2 2 B 6 3 C 5 4 D 7
示例2
y1<−sample(c("Male","Female"),20,replace=TRUE) y2<−sample(21:50,20) df2<−data.frame(y1,y2) df2
输出
y1 y2 1 Female 29 2 Male 43 3 Female 34 4 Male 49 5 Male 28 6 Female 23 7 Female 27 8 Female 31 9 Female 36 10 Female 41 11 Male 25 12 Female 24 13 Male 30 14 Female 22 15 Female 37 16 Male 42 17 Female 47 18 Male 35 19 Female 32 20 Female 21
使用列索引代替列名来汇总y1:
示例
df2%>%group_by_at(1)%>%summarise(n=n()) `summarise()` ungrouping output (override with `.groups` argument)
输出
# A tibble: 2 x 2 y1 n <chr> <int> 1 Female 13 2 Male 7
广告