如何在 R 中基于 data.table 的分组列创建随机样本?
随机抽样有助于减少分析中的偏差。如果我们按组查看数据,则可能需要基于组查找随机样本。例如,如果我们有一个包含分组变量的数据框,且每个组包含十个值,则我们可能需要创建一个随机样本,其中我们将从每组随机选取两个值。这可以通过在 .SD 中使用 sample 函数来实现
示例
考虑以下 data.table −
library(data.table) Group<-rep(c("A","B","C","D","E"),times=4) Percentage<-sample(1:100,20) dt1<-data.table(Group,Percentage) dt1
输出
Group Percentage 1: A 97 2: B 68 3: C 19 4: D 32 5: E 98 6: A 48 7: B 94 8: C 54 9: D 7 10: E 76 11: A 10 12: B 31 13: C 59 14: D 84 15: E 41 16: A 99 17: B 1 18: C 72 19: D 42 20: E 17
从每个组创建大小为 2 的随机样本 −
示例
dt1[,.SD[sample(.N, min(2,.N))],by=Group]
输出
Group Percentage 1: A 48 2: A 99 3: B 94 4: B 31 5: C 54 6: C 59 7: D 42 8: D 84 9: E 98 10: E 76
我们来看另一个示例 −
示例
Class<-rep(c("First","Second","Third","Fourth"),times=10) Experience<-sample(1:5,40,replace=TRUE) dt2<-data.table(Class,Experience) head(dt2,10)
输出
Class Experience 1: First 4 2: Second 2 3: Third 4 4: Fourth 2 5: First 4 6: Second 5 7: Third 3 8: Fourth 5 9: First 3 10: Second 5
示例
tail(dt2,10)
输出
Class Experience 1: Third 4 2: Fourth 2 3: First 5 4: Second 2 5: Third 1 6: Fourth 4 7: First 5 8: Second 2 9: Third 4 10: Fourth 4
示例
dt2[,.SD[sample(.N, min(5,.N))],by=Class]
输出
Class Experience 1: First 3 2: First 3 3: First 4 4: First 5 5: First 5 6: Second 5 7: Second 2 8: Second 5 9: Second 2 10: Second 1 11: Third 3 12: Third 1 13: Third 4 14: Third 3 15: Third 4 16: Fourth 2 17: Fourth 5 18: Fourth 2 19: Fourth 4 20: Fourth 2
广告