我尝试使用dplyr管道从子集中删除NA。我的回答是错过了一步的迹象。我正在尝试学习如何使用dplyr编写函数:
> outcome.df%>%
+ group_by(Hospital,State)%>%
+ arrange(desc(HeartAttackDeath,na.rm=TRUE))%>%
+ head()
Source: local data frame [6 x 5]
Groups: Hospital, State
Hospital State HeartAttackDeath 1 ABBEVILLE AREA MEDICAL CENTER SC NA 2 ABBEVILLE GENERAL HOSPITAL LA NA 3 ABBOTT NORTHWESTERN HOSPITAL MN 12.3 4 ABILENE REGIONAL MEDICAL CENTER TX 17.2 5 ABINGTON MEMORIAL HOSPITAL PA 14.3 6 ABRAHAM LINCOLN MEMORIAL HOSPITAL IL NA Variables not shown: HeartFailureDeath (dbl), PneumoniaDeath (dbl)
答案 0 :(得分:119)
我不认为desc
会提出na.rm
论点......我真的很惊讶,当你给它一个时,它不会抛出错误。如果您只想删除NA
,请使用na.omit
(基数)或tidyr::drop_na
:
outcome.df %>%
na.omit() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
library(tidyr)
outcome.df %>%
drop_na() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
如果您只想从HeartAttackDeath列中删除NA
,请使用is.na
进行过滤,或使用tidyr::drop_na
:
outcome.df %>%
filter(!is.na(HeartAttackDeath)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
outcome.df %>%
drop_na(HeartAttackDeath) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
正如在dupe中指出的那样,complete.cases
也可以使用,但放入链中有点棘手,因为它将数据帧作为参数但返回索引向量。所以你可以像这样使用它:
outcome.df %>%
filter(complete.cases(.)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()