我正在处理如下所示的小问题:
ex <- structure(list(rowid = c(4L, 5L, 6L, 9L, 10L), timestamp = structure(c(1502480694.03336,
1502480695.44736, 1502480696.03336, 1502480703.99836, 1502480706.19936
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), cat = c(32L,
1L, 1L, 1L, 1L), var1 = structure(c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = "1", class = "factor"),
var2 = c(0, 50, 29.7, 51, 70.8), var3 = c(NA, 26.3, 24, 20.5,
12), order = c(NA, 1L, 1L, 1L, 1L), bfr = list(NA, structure(list(
rowid = integer(0), timestamp = structure(numeric(0), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), cat = integer(0), var1 = structure(integer(0), .Label = "1", class = "factor"),
var2 = numeric(0), var3 = numeric(0), order = integer(0)), class = c("tbl_df",
"tbl", "data.frame"), row.names = integer(0)), structure(list(
rowid = 5L, timestamp = structure(1502480695.44736, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), cat = 1L, var1 = structure(NA_integer_, .Label = "1", class = "factor"),
var2 = 50, var3 = 26.3, order = 1L), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -1L)), structure(list(
rowid = 5:8, timestamp = structure(c(1502480695.44736,
1502480696.03336, 1502480699.03336, 1502480701.03336), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), cat = c(1L, 1L, 1L, 1L), var1 = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = "1", class = "factor"),
var2 = c(50, 29.7, 52.8, 44), var3 = c(26.3, 24, 8.9,
12.4), order = c(1L, 1L, 1L, 1L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L)), structure(list(
rowid = 5:9, timestamp = structure(c(1502480695.44736,
1502480696.03336, 1502480699.03336, 1502480701.03336,
1502480703.99836), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
cat = c(1L, 1L, 1L, 1L, 1L), var1 = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = "1", class = "factor"),
var2 = c(50, 29.7, 52.8, 44, 51), var3 = c(26.3, 24,
8.9, 12.4, 20.5), order = c(1L, 1L, 1L, 1L, 1L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L)))), row.names = c(4L,
5L, 6L, 9L, 10L), class = "data.frame")
我想用bfr
来概括map
列中的嵌套小标题。为了省略不必要的计算,我想使用map_if
,当bfr
包含少于两行的cat == 1
时,该行将跳过。但是,由于NA
的存在和bfr
列中的空白标记,我在编写适当的谓词功能方面感到很费力。这是我的尝试:
more_than <- function(df){
if (nrow(df) == 0 | is.na(df)) return(FALSE)
n <- df %>%
summarise(sum(cat == 1)) %>%
as.numeric()
return(n > 2)
}
ex %>%
mutate(mean_var2 = map_if(bfr, more_than,
~.x %>% summarise(mean_var2 = mean(var2))))
结果为:
if(nrow(df)== 0 | is.na(df))return(FALSE)中的错误: 参数的长度为零
如何处理NA
和空小滴的存在,以编写适当的谓词功能?
答案 0 :(得分:2)
如果要获取“ var2”列的mean
,请检查list
元素是data.frame
还是tibble
(在这种情况下,这是一个小标题) ),然后执行summarise
out <- ex %>%
mutate(mean_var2 = map_if(bfr, is.tibble, ~
.x %>%
summarise(mean_var2 = mean(var2, na.rm = TRUE))))
如果我们还需要检查sum(cat ==1) > 2
more_than <- function(df){
i1 <- is_tibble(df)
if(i1) {
n <- df %>%
summarise(v1 = sum(cat == 1)) %>%
pull(v1)
}
i1 && (n > 2)
}
ex %>%
mutate(mean_var2 = map_if(bfr, more_than, ~
.x %>%
summarise(mean_var2 = mean(var2, na.rm = TRUE))))
is.na
不起作用的原因是,它检查每个数据集,并且在某些数据集中它是一个tibble
,并且返回逻辑matrix
,而{{1} }会返回一个TRUE / FALSE。例如
if/else
产生不同的输出
一个选择是使用(3 == 4) & (cbind(3:5, 1:3) == 3)
,以便仅在第一个条件为TRUE时才评估rhs条件,从而避免不必要的评估
&&
在OP的原始功能中,如果我们将(3 == 4) && (cbind(3:5, 1:3) == 3)
#[1] FALSE
替换为|
,它将正常工作
||
如果我们要为那些未满足的情况返回不适用
more_than <- function(df){
if (nrow(df) == 0 || is.na(df)) return(FALSE)
n <- df %>%
summarise(sum(cat == 1)) %>%
as.numeric()
return(n > 2)
}
或者另一种选择是使用ex %>%
mutate(mean_var2 = map_dbl(bfr, ~
if(is_tibble(.x) && sum(.x$cat == 1) > 2) mean(.x$var2, na.rm = TRUE) else NA))
(类似于possibly
)
tryCatch
答案 1 :(得分:1)
首先,我们需要在检查 nrow 之前使用||
“查看|和|| here之间的差异”来检查NA。然后我们需要.else
,即:
.else应用于.x元素的函数,.p对其返回FALSE。
more_than
返回FLASE时
more_than <- function(df){
# browser()
if (all(is.na(df)) || nrow(df) == 0) return(FALSE)
n <- df %>%
summarise(sum(cat == 1)) %>%
as.numeric()
return(n > 2)
}
ex %>%
mutate(mean_var2 = map_if(bfr, more_than,
~.x %>% summarise(mean_var2 = mean(var2,na.rm = TRUE)),
.else = ~return(NA))) %>%
select(mean_var2)
mean_var2
1 NA
2 NA
3 NA
4 44.125
5 45.5