为什么即使在 R 中使用 dplyr 将 na.rm 设置为 TRUE，均值也是 NaN？

如果使用 dplyr 包将 na.rm 设置为 TRUE，则统计运算的输出结果为 NaN。为避免这种情况，我们需要排除 na.rm。请按照以下步骤了解两者之间的区别：

首先，创建一个数据框。
如果数据框中存在 NA，则使用 na.rm 设置为 TRUE 来汇总数据框。
不将 na.rm 设置为 TRUE 来汇总数据框。

创建数据框

让我们创建一个如下所示的数据框：

Group&li;-rep(c("First","Second","Third"),times=c(3,10,7))
Response&li;-rep(c(NA,3,4,5,7,8),times=c(3,2,5,2,4,4))
df&li;-data.frame(Group,Response)
df

执行上述脚本后，将生成以下输出（由于随机化，此输出将在您的系统上有所不同）：

Group Response
1 First NA
2 First NA
3 First NA
4 Second 3
5 Second 3
6 Second 4
7 Second 4
8 Second 4
9 Second 4
10 Second 4
11 Second 5
12 Second 5
13 Second 7
14 Third 7
15 Third 7
16 Third 7
17 Third 8
18 Third 8
19 Third 8
20 Third 8

使用 na.rm 设置为 TRUE 汇总数据框

加载 dplyr 包并使用每个组的 Response 均值汇总数据框 df：

library(dplyr)
Group<-rep(c("First","Second","Third"),times=c(3,10,7))
Response<-rep(c(NA,3,4,5,7,8),times=c(3,2,5,2,4,4))
df<-data.frame(Group,Response)
df%>%group_by(Group)%>%summarise(mean=mean(Response,na.rm=TRUE))

# A tibble: 3 x 2
Group mean
  <chr> <dbl>
1 First NaN
2 Second 4.3
3 Third 7.57

不将 na.rm 设置为 TRUE 汇总数据框

不将 na.rm 设置为 TRUE，使用每个组的 Response 均值汇总数据框 df：

Group<-rep(c("First","Second","Third"),times=c(3,10,7))
Response<-rep(c(NA,3,4,5,7,8),times=c(3,2,5,2,4,4))
df<-data.frame(Group,Response)
df%>%group_by(Group)%>%summarise(mean=mean(Response))

# A tibble: 3 x 2
Group mean
  <chr> <dbl>
1 First NA
2 Second 4.3
3 Third 7.57

Nizamuddin Siddiqui

更新于：2021年8月13日

777 次浏览

启动你的职业生涯

完成课程获得认证

开始学习