如何在 R 中根据两个不同的字符列查找整数列的总和?


基于两个不同的字符列计算整数列的总和,简单来说就是我们需要为可用数据创建一个列联表。为此,我们可以使用 with 和 tapply 函数。例如,如果我们有一个数据框 df,其中包含两个定义为性别和种族的分类列,以及一个定义为 Package 的整数列,则可以创建列联表如下:

with(df,tapply(Package,list(gender,ethnicity),sum))

示例

考虑以下数据框 -

 实时演示

set.seed(777)
Class<−sample(c("First","Second","Third"),20,replace=TRUE)
Group<−sample(c("GP1","GP2","GP3","GP4"),20,replace=TRUE)
Rate<−sample(0:10,20,replace=TRUE)
df1<−data.frame(Class,Group,Rate)
df1

输出

   Class Group Rate
1 First   GP1 7
2 Second  GP2 1
3 Second  GP4 1
4 Second  GP4 0
5 Third   GP2 10
6 Second  GP2 8
7 First   GP1 7
8 First   GP4 4
9 Second  GP1 4
10 Third  GP3 8
11 Second GP2 8
12 First  GP2 4
13 Third  GP2 6
14 Third  GP4 4
15 Third  GP4 5
16 Second GP1 2
17 Second GP1 9
18 Second GP3 2
19 Second GP3 1
20 Third  GP4 10

示例

str(df1)
'data.frame': 20 obs. of 3 variables:
$ Class: chr "First" "Second" "Second" "Second" ...
$ Group: chr "GP1" "GP2" "GP4" "GP4" ...
$ Rate : int 7 1 1 0 10 8 7 4 4 8 ...

根据 Class 和 Group 查找 Rate 的总和 -

with(df1,tapply(Rate,list(Class,Group),sum))
GP1 GP2 GP3 GP4
First  14 4 NA 4
Second 15 17 3 1
Third  NA 16 8 19

让我们看另一个例子 -

示例

 实时演示

Gender<−sample(c("Male","Female"),20,replace=TRUE)
Centering<−sample(c("Yes","No"),20,replace=TRUE)
Percentage<−sample(1:100,20)
df2<−data.frame(Gender,Centering,Percentage)
df2

输出

Gender Centering Percentage
1 Male    No  28
2 Male    No  89
3 Female  Yes 38
4 Male    No  78
5 Male    Yes 19
6 Female  No  46
7 Female  Yes 94
8 Male    No   4
9 Male    Yes 92
10 Male   No  90
11 Male   Yes 66
12 Female No  57
13 Female No  74
14 Female No  48
15 Female Yes 20
16 Male   Yes 51
17 Male   No  82
18 Male   No   7
19 Male   No  53
20 Male   No  55

根据 Gender 和 Centering 查找 Percentage 的总和 -

with(df2,tapply(Percentage,list(Gender,Centering),sum))
No Yes
Female 225 152
Male 486 228

更新于: 2020-10-17

68 次浏览

开启你的 职业生涯

通过完成课程获得认证

开始学习
广告