考虑以下数据框:
(tmp_df <-
structure(list(class = c(0L, 0L, 1L, 1L, 2L, 2L), logi = c(TRUE,
FALSE, TRUE, FALSE, TRUE, FALSE), val = c(1, 1, 1, 1, 1, 1),
taken = c(1.00684931506849, 0.993197278911565, 1.025, 0.975609756097561,
1.00826446280992, 0.991803278688525)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L), .Names = c("class",
"logi", "val", "taken")))
创建:
Source: local data frame [6 x 4]
class logi val taken
<int> <lgl> <dbl> <dbl>
1 0 TRUE 1 1.0068493
2 0 FALSE 1 0.9931973
3 1 TRUE 1 1.0250000
4 1 FALSE 1 0.9756098
5 2 TRUE 1 1.0082645
6 2 FALSE 1 0.9918033
我希望按类进行分组,如果每个组包含两个成员,则val
如果logi == FALSE
从taken
减去1,否则从中减去该val
的最小值val
。如果每个组不包含两个成员,那么我们从dplyr
中减去零。
使用tmp_df %>%
group_by(class) %>%
mutate(taken_2 = ifelse(n() != 2, 0,
ifelse(logi, min(taken), 1)),
not_taken = val - taken_2)
包执行上述操作的代码可以使用:
ifelse
然而,这会产生不正确的结果,其中第二个Source: local data frame [6 x 6]
Groups: class [3]
class logi val taken taken_2 not_taken
<int> <lgl> <dbl> <dbl> <dbl> <dbl>
1 0 TRUE 1 1.0068493 0.9931973 0.006802721
2 0 FALSE 1 0.9931973 0.9931973 0.006802721
3 1 TRUE 1 1.0250000 0.9756098 0.024390244
4 1 FALSE 1 0.9756098 0.9756098 0.024390244
5 2 TRUE 1 1.0082645 0.9918033 0.008196721
6 2 FALSE 1 0.9918033 0.9918033 0.008196721
始终解析为第一个条件:
ifelse
如果我们没有第一个tmp_df %>%
group_by(class) %>%
mutate(taken_2 = ifelse(logi, min(taken), 1),
not_taken = val - taken_2)
语句,则可以生成正确的结果。
Source: local data frame [6 x 6]
Groups: class [3]
class logi val taken taken_2 not_taken
<int> <lgl> <dbl> <dbl> <dbl> <dbl>
1 0 TRUE 1 1.0068493 0.9931973 0.006802721
2 0 FALSE 1 0.9931973 1.0000000 0.000000000 # correct!
3 1 TRUE 1 1.0250000 0.9756098 0.024390244
4 1 FALSE 1 0.9756098 1.0000000 0.000000000 # correct!
5 2 TRUE 1 1.0082645 0.9918033 0.008196721
6 2 FALSE 1 0.9918033 1.0000000 0.000000000 # correct!
制造
mutate
通过检查成功执行类似操作的其他代码片段,我们可以看到此问题似乎与ifelse
和嵌套tmp_df %>%
group_by(class) %>%
mutate(taken_2 = ifelse(n() != 3, 0,
ifelse(logi, min(taken), 1)),
not_taken = val - taken_2)
tmp_df_2 <-
tmp_df %>%
filter(row_number() <= 2)
(tmp_df_2$taken_2 <-
ifelse(c(0, 0), 0,
ifelse(tmp_df_2$logi, min(tmp_df_2$taken), 1)))
## but the following does not work (checks problem is not to do with grouping)
# tmp_df_2 %>%
# mutate(taken_2 = ifelse(n() != 2, 0,
# ifelse(logi, min(taken), 1)),
# not_taken = val - taken_2)
隔离开来:
ifelse
为什么会发生这种情况,如何获得预期的行为?解决方法是将嵌套的tmp_df %>%
group_by(class) %>%
mutate(taken_2 = ifelse(n() != 2, 0, 1),
taken_3 = taken_2 * ifelse(logi, min(taken), 1),
not_taken = val - taken_3)
逻辑拆分为多个内联变异:
nmap -F -A -sSU ultra
nmap -PN -sSV -T4 -F www.amazon.com
其他人已经发现了嵌套ifelse的类似问题,但我不知道它是否具有相同的根: ifelse using dplyr results in NAs for some records
答案 0 :(得分:4)
您是mutate(taken_2 = ifelse(n() != 2, 0,
ifelse(logi, min(taken), 1))
矢量回收的受害者。关键是这一行:
n() != 2
由于ifelse
的长度为1(对于每个组),logi
仅考虑第一个if
并重复/回收此值。
您应该使用if_else
和mutate(taken_2 = if (n() != 2) 0 else if_else(logi, min(taken), 1))
:
ifelse
我建议从不使用abs(result - expectedResult) < 0.00001
。从几乎造成数百万美元错误的人那里拿走它,因为这个错误。
答案 1 :(得分:3)
来自‘ifelse’ returns a value with the same shape as ‘test’
,
n() != 2
并且由于ifelse
返回长度为1的向量,并且始终为true,因此第二个ifelse
始终返回长度为1的向量,但会被回收以适合组的形状。一种解决方案是将组长度的向量馈送到第一个tmp_df %>%
group_by(class) %>%
mutate(taken_2 = ifelse(rep(n() != 2, n()), 0,
ifelse(logi, min(taken), 1)),
not_taken = val - taken_2)
# Source: local data frame [6 x 6]
# Groups: class [3]
# class logi val taken taken_2 not_taken
# <int> <lgl> <dbl> <dbl> <dbl> <dbl>
# 1 0 TRUE 1 1.0068493 0.9931973 0.006802721
# 2 0 FALSE 1 0.9931973 1.0000000 0.000000000
# 3 1 TRUE 1 1.0250000 0.9756098 0.024390244
# 4 1 FALSE 1 0.9756098 1.0000000 0.000000000
# 5 2 TRUE 1 1.0082645 0.9918033 0.008196721
# 6 2 FALSE 1 0.9918033 1.0000000 0.000000000
:
<!DOCTYPE html>
<html>
<body>
<p>Click the button to display a string as a hyperlink.</p>
<button onclick="myFunction()">Try it</button>
<p id="demo"></p>
<script>
function myFunction() {
var str = "Testing!";
var result = str.link("http://www.google.com");
document.getElementById("demo").innerHTML = result;
}
</script>
</body>
</html>