如何在 R 数据框中使用 dplyr 选择带有变量分组最小值或最大值的行?
如果一个 R 数据框包含一个级别众多的分组变量,那么根据分组级别确定离散或连续变量的最小值和最大值就会变得困难。但是,这可以通过 dplyr 包中的切片函数实现。
考虑一下下面这个只有一个分组变量和连续及离散变量的数据框 −
> set.seed(2) > x1<-sample(1:100,20,replace=TRUE) > x2<-sample(1:10,20,replace=TRUE) > x3<-rpois(20,10) > x4<-rpois(20,5) > x5<-rpois(20,5) > x6<-runif(20,2,5) > x7<-sample(200:1000,20,replace=TRUE) > Group<-rep(c(1,2,3,4),times=5) > df<-data.frame(x1,x2,x3,x4,x5,x6,x7,Group) > df x1 x2 x3 x4 x5 x6 x7 Group 1 85 8 14 7 8 2.900301 749 1 2 79 7 12 4 3 3.331022 200 2 3 70 1 17 5 6 4.190603 883 3 4 6 6 11 8 5 4.004491 649 4 5 32 9 13 5 4 2.934971 641 1 6 8 4 7 3 6 3.435734 699 2 7 17 6 9 3 4 2.874230 679 3 8 93 9 7 1 3 2.546523 642 4 9 81 8 8 6 8 3.082288 496 1 10 76 6 7 4 1 4.711400 570 2 11 41 3 8 5 9 3.182143 847 3 12 50 9 9 7 4 4.339642 707 4 13 75 7 8 7 6 2.852477 805 1 14 65 8 16 8 5 4.561162 233 2 15 3 6 8 2 7 2.516727 783 3 16 80 2 9 9 3 2.237793 788 4 17 96 7 10 5 4 2.876195 792 1 18 50 2 7 1 3 4.521114 375 2 19 55 3 6 3 4 4.835804 942 3 20 63 4 9 10 6 2.134896 228 4
加载 dplyr 包 −
> library(dplyr)
查找特定变量最小值和最大值的分组行 −
> df %>% group_by(Group) %>% slice(which.min(x5)) # A tibble: 4 x 8 # Groups: Group [4] x1 x2 x3 x4 x5 x6 x7 Group <int> <int> <int> <int> <int> <dbl> <int> <dbl> 1 32 9 13 5 4 2.93 641 1 2 76 6 7 4 1 4.71 570 2 3 17 6 9 3 4 2.87 679 3 4 93 9 7 1 3 2.55 642 4 > df %>% group_by(Group) %>% slice(which.max(x5)) # A tibble: 4 x 8 # Groups: Group [4] x1 x2 x3 x4 x5 x6 x7 Group <int> <int> <int> <int> <int> <dbl> <int> <dbl> 1 85 8 14 7 8 2.90 749 1 2 8 4 7 3 6 3.44 699 2 3 41 3 8 5 9 3.18 847 3 4 63 4 9 10 6 2.13 228 4 > df %>% group_by(Group) %>% slice(which.max(x7)) # A tibble: 4 x 8 # Groups: Group [4] x1 x2 x3 x4 x5 x6 x7 Group <int> <int> <int> <int> <int> <dbl> <int> <dbl> 1 75 7 8 7 6 2.85 805 1 2 8 4 7 3 6 3.44 699 2 3 55 3 6 3 4 4.84 942 3 4 80 2 9 9 3 2.24 788 4 > df %>% group_by(Group) %>% slice(which.min(x1)) # A tibble: 4 x 8 # Groups: Group [4] x1 x2 x3 x4 x5 x6 x7 Group <int> <int> <int> <int> <int> <dbl> <int> <dbl> 1 32 9 13 5 4 2.93 641 1 2 8 4 7 3 6 3.44 699 2 3 3 6 8 2 7 2.52 783 3 4 6 6 11 8 5 4.00 649 4 > df %>% group_by(Group) %>% slice(which.max(x1)) # A tibble: 4 x 8 # Groups: Group [4] x1 x2 x3 x4 x5 x6 x7 Group <int> <int> <int> <int> <int> <dbl> <int> <dbl> 1 96 7 10 5 4 2.88 792 1 2 79 7 12 4 3 3.33 200 2 3 70 1 17 5 6 4.19 883 3 4 93 9 7 1 3 2.55 642 4 > df %>% group_by(Group) %>% slice(which.max(x6)) # A tibble: 4 x 8 # Groups: Group [4] x1 x2 x3 x4 x5 x6 x7 Group <int> <int> <int> <int> <int> <dbl> <int> <dbl> 1 81 8 8 6 8 3.08 496 1 2 76 6 7 4 1 4.71 570 2 3 55 3 6 3 4 4.84 942 3 4 50 9 9 7 4 4.34 707 4 > df %>% group_by(Group) %>% slice(which.min(x6)) # A tibble: 4 x 8 # Groups: Group [4] x1 x2 x3 x4 x5 x6 x7 Group <int> <int> <int> <int> <int> <dbl> <int> <dbl> 1 75 7 8 7 6 2.85 805 1 2 79 7 12 4 3 3.33 200 2 3 3 6 8 2 7 2.52 783 3 4 63 4 9 10 6 2.13 228 4
广告