考虑以下示例,该示例使用dplyr
summarise
管道汇总数据框,以识别与某些min
相关联的DATE
imum CHAR
}:
library('tidyverse')
library('lubridate')
temp <- data.frame(
CHAR = c(
'A',
'B',
'C'
),
DATE = c(
'20090101',
'20100101',
NA
) %>% ymd(), # Turn character strings to dates
stringsAsFactors = FALSE
) %>% group_by(
CHAR
) %>% summarise(
DATE = min(DATE, na.rm = TRUE) # Extract minimum date
) %>% ungroup()
使用min
来确定NA
imum是否is.na
是否已经过测试:
temp %>% mutate(
DATE_lgl = DATE %>% is.na() # Identify dates that are missing/NA
)
输出
# A tibble: 3 x 3
CHAR DATE DATE_lgl
<chr> <date> <lgl>
1 A 2009-01-01 FALSE
2 B 2010-01-01 FALSE
3 C NA FALSE
错误地DATE_lgl
显示FALSE
DATE
为NA
。那是为什么?
删除na.rm = TRUE
可解决此问题,但无法使用以下配置,其中需要na.rm = TRUE
来消除缺失的条目:
temp <- data.frame(
CHAR = c(
'A',
'B',
'C',
'C'
),
DATE = c(
'20090101',
'20100101',
NA,
'20110101'
) %>% ymd(), # Turn character strings to dates
stringsAsFactors = FALSE
) %>% group_by(
CHAR
) %>% summarise(
DATE = min(DATE, na.rm = TRUE) # Extract minimum date
) %>% ungroup()
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 lubridate_1.7.4 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.5 purrr_0.2.5
[7] readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_2.2.1 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 cellranger_1.1.0 pillar_1.2.3 compiler_3.5.0 plyr_1.8.4 bindr_0.1.1
[7] tools_3.5.0 jsonlite_1.5 nlme_3.1-137 gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.1
[13] rlang_0.2.1 psych_1.8.4 cli_1.0.0 rstudioapi_0.7 yaml_2.1.19 parallel_3.5.0
[19] haven_1.1.1 xml2_1.2.0 httr_1.3.1 hms_0.4.2 grid_3.5.0 tidyselect_0.2.4
[25] glue_1.2.0 R6_2.2.2 readxl_1.1.0 foreign_0.8-70 modelr_0.1.2 reshape2_1.4.3
[31] magrittr_1.5 scales_0.5.0 rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2
[37] utf8_1.1.4 stringi_1.1.7 lazyeval_0.2.1 munsell_0.4.3 broom_0.4.4 crayon_1.3.4
答案 0 :(得分:4)
问题是你正在评估
min(NA, na.rm=TRUE)
# Inf
第3行的,导致它
dput(temp$DATE[3])
# structure(Inf, class = "Date")
将is.finite
添加到您的mutate
temp %>%
mutate(DATE_lgl = is.finite(DATE) | is.na(DATE) # Identify dates that are missing/NA)
# A tibble: 3 x 3
# CHAR DATE DATE_lgl
# <chr> <date> <lgl>
# 1 A 2009-01-01 TRUE
# 2 B 2010-01-01 TRUE
# 3 C NA FALSE
打印NA
可能是Date类的打印限制
as.Date(Inf, origin="1970-01-01")
# NA
dput(as.Date(Inf, origin="1970-01-01"))
# structure(Inf, class = "Date")
答案 1 :(得分:2)
解决方法是将Date
列转换为字符,然后评估它是否为NA
。
temp %>% mutate(
DATE_lgl = is.na(as.character(DATE))
)
# # A tibble: 3 x 3
# CHAR DATE DATE_lgl
# <chr> <date> <lgl>
# 1 A 2009-01-01 FALSE
# 2 B 2010-01-01 FALSE
# 3 C NA TRUE