如何使用dplyr包从R数据框中删除多行?
有时,我们在数据集中会得到一些不需要的信息,需要将其删除,这些信息可能是一个单独的案例、多个案例、整个变量或任何其他对实现我们的分析目标没有帮助的东西,因此我们希望将其删除。如果我们想在R数据框中使用dplyr包删除此类行,则可以使用anti_join函数。
示例
考虑以下数据框
> set.seed(2514) > x1<-rnorm(20,5) > x2<-rnorm(20,5,0.05) > df1<-data.frame(x1,x2) > df1
输出
x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989 11 3.814416 4.990552 12 3.359167 4.891964 13 5.304671 4.950883 14 4.768564 4.953290 15 3.842797 4.950219 16 5.270018 4.995953 17 6.344269 5.008545 18 5.366249 4.905290 19 5.547608 5.098554 20 5.266844 5.003416
加载dplyr包
> library(dplyr)
从df1中删除第1到第5行
> anti_join(df1,df1[1:5,]) Joining, by = c("x1", "x2") x1 x2 1 5.950218 5.038626 2 4.903268 5.010087 3 7.462286 4.974513 4 5.056762 5.097812 5 6.031768 5.002989 6 3.814416 4.990552 7 3.359167 4.891964 8 5.304671 4.950883 9 4.768564 4.953290 10 3.842797 4.950219 11 5.270018 4.995953 12 6.344269 5.008545 13 5.366249 4.905290 14 5.547608 5.098554 15 5.266844 5.003416
从df1中删除第11到第18行
> anti_join(df1,df1[11:18,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989 11 5.547608 5.098554 12 5.266844 5.003416
从df1中删除第6到第12行
> anti_join(df1,df1[6:12,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.304671 4.950883 7 4.768564 4.953290 8 3.842797 4.950219 9 5.270018 4.995953 10 6.344269 5.008545 11 5.366249 4.905290 12 5.547608 5.098554 13 5.266844 5.003416
从df1中删除第15到第20行
> anti_join(df1,df1[15:20,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989 11 3.814416 4.990552 12 3.359167 4.891964 13 5.304671 4.950883 14 4.768564 4.953290
从df1中删除第5到第18行
> anti_join(df1,df1[5:18,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 5.547608 5.098554 6 5.266844 5.003416
从df1中删除第11到第20行
> anti_join(df1,df1[11:20,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989
从df1中删除第1到第10行
> anti_join(df1,df1[1:10,]) Joining, by = c("x1", "x2") x1 x2 1 3.814416 4.990552 2 3.359167 4.891964 3 5.304671 4.950883 4 4.768564 4.953290 5 3.842797 4.950219 6 5.270018 4.995953 7 6.344269 5.008545 8 5.366249 4.905290 9 5.547608 5.098554 10 5.266844 5.003416
从df1中删除第2到第11行
> anti_join(df1,df1[2:11,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 3.359167 4.891964 3 5.304671 4.950883 4 4.768564 4.953290 5 3.842797 4.950219 6 5.270018 4.995953 7 6.344269 5.008545 8 5.366249 4.905290 9 5.547608 5.098554 10 5.266844 5.003416
广告