Question

考虑以下数据框：

(tmp_df <-
structure(list(class = c(0L, 0L, 1L, 1L, 2L, 2L), logi = c(TRUE, 
FALSE, TRUE, FALSE, TRUE, FALSE), val = c(1, 1, 1, 1, 1, 1), 
    taken = c(1.00684931506849, 0.993197278911565, 1.025, 0.975609756097561, 
    1.00826446280992, 0.991803278688525)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L), .Names = c("class", 
"logi", "val", "taken")))

创建：

Source: local data frame [6 x 4]

  class  logi   val     taken
  <int> <lgl> <dbl>     <dbl>
1     0  TRUE     1 1.0068493
2     0 FALSE     1 0.9931973
3     1  TRUE     1 1.0250000
4     1 FALSE     1 0.9756098
5     2  TRUE     1 1.0082645
6     2 FALSE     1 0.9918033

我希望按类进行分组，如果每个组包含两个成员，则val如果logi == FALSE从taken减去1，否则从中减去该val的最小值val。如果每个组不包含两个成员，那么我们从dplyr中减去零。

使用tmp_df %>% group_by(class) %>% mutate(taken_2 = ifelse(n() != 2, 0, ifelse(logi, min(taken), 1)), not_taken = val - taken_2)包执行上述操作的代码可以使用：

表示

ifelse

然而，这会产生不正确的结果，其中第二个Source: local data frame [6 x 6] Groups: class [3] class logi val taken taken_2 not_taken <int> <lgl> <dbl> <dbl> <dbl> <dbl> 1 0 TRUE 1 1.0068493 0.9931973 0.006802721 2 0 FALSE 1 0.9931973 0.9931973 0.006802721 3 1 TRUE 1 1.0250000 0.9756098 0.024390244 4 1 FALSE 1 0.9756098 0.9756098 0.024390244 5 2 TRUE 1 1.0082645 0.9918033 0.008196721 6 2 FALSE 1 0.9918033 0.9918033 0.008196721 始终解析为第一个条件：

ifelse

如果我们没有第一个tmp_df %>% group_by(class) %>% mutate(taken_2 = ifelse(logi, min(taken), 1), not_taken = val - taken_2)语句，则可以生成正确的结果。

Source: local data frame [6 x 6]
Groups: class [3]

  class  logi   val     taken   taken_2   not_taken
  <int> <lgl> <dbl>     <dbl>     <dbl>       <dbl>
1     0  TRUE     1 1.0068493 0.9931973 0.006802721
2     0 FALSE     1 0.9931973 1.0000000 0.000000000 # correct!
3     1  TRUE     1 1.0250000 0.9756098 0.024390244
4     1 FALSE     1 0.9756098 1.0000000 0.000000000 # correct!
5     2  TRUE     1 1.0082645 0.9918033 0.008196721
6     2 FALSE     1 0.9918033 1.0000000 0.000000000 # correct!

制造

mutate

通过检查成功执行类似操作的其他代码片段，我们可以看到此问题似乎与ifelse和嵌套tmp_df %>% group_by(class) %>% mutate(taken_2 = ifelse(n() != 3, 0, ifelse(logi, min(taken), 1)), not_taken = val - taken_2) tmp_df_2 <- tmp_df %>% filter(row_number() <= 2) (tmp_df_2$taken_2 <- ifelse(c(0, 0), 0, ifelse(tmp_df_2$logi, min(tmp_df_2$taken), 1))) ## but the following does not work (checks problem is not to do with grouping) # tmp_df_2 %>% # mutate(taken_2 = ifelse(n() != 2, 0, # ifelse(logi, min(taken), 1)), # not_taken = val - taken_2)隔离开来：

ifelse

为什么会发生这种情况，~~如何获得预期的行为~~？解决方法是将嵌套的tmp_df %>% group_by(class) %>% mutate(taken_2 = ifelse(n() != 2, 0, 1), taken_3 = taken_2 * ifelse(logi, min(taken), 1), not_taken = val - taken_3)逻辑拆分为多个内联变异：

 nmap -F -A -sSU ultra

 nmap -PN -sSV -T4 -F www.amazon.com

其他人已经发现了嵌套ifelse的类似问题，但我不知道它是否具有相同的根： ifelse using dplyr results in NAs for some records

Answer 1

您是mutate(taken_2 = ifelse(n() != 2, 0, ifelse(logi, min(taken), 1))矢量回收的受害者。关键是这一行：

n() != 2

由于ifelse的长度为1（对于每个组），logi仅考虑第一个if并重复/回收此值。

您应该使用if_else和mutate(taken_2 = if (n() != 2) 0 else if_else(logi, min(taken), 1))：

ifelse

我建议从不使用abs(result - expectedResult) < 0.00001。从几乎造成数百万美元错误的人那里拿走它，因为这个错误。

Answer 2

来自‘ifelse’ returns a value with the same shape as ‘test’，

n() != 2

并且由于ifelse返回长度为1的向量，并且始终为true，因此第二个ifelse始终返回长度为1的向量，但会被回收以适合组的形状。一种解决方案是将组长度的向量馈送到第一个tmp_df %>% group_by(class) %>% mutate(taken_2 = ifelse(rep(n() != 2, n()), 0, ifelse(logi, min(taken), 1)), not_taken = val - taken_2) # Source: local data frame [6 x 6] # Groups: class [3] # class logi val taken taken_2 not_taken # <int> <lgl> <dbl> <dbl> <dbl> <dbl> # 1 0 TRUE 1 1.0068493 0.9931973 0.006802721 # 2 0 FALSE 1 0.9931973 1.0000000 0.000000000 # 3 1 TRUE 1 1.0250000 0.9756098 0.024390244 # 4 1 FALSE 1 0.9756098 1.0000000 0.000000000 # 5 2 TRUE 1 1.0082645 0.9918033 0.008196721 # 6 2 FALSE 1 0.9918033 1.0000000 0.000000000：

<!DOCTYPE html>
<html>
<body>

<p>Click the button to display a string as a hyperlink.</p>

<button onclick="myFunction()">Try it</button>

<p id="demo"></p>

<script>
function myFunction() {
    var str = "Testing!";
    var result = str.link("http://www.google.com");
    document.getElementById("demo").innerHTML = result;
}
</script>

</body>
</html>

为什么嵌套的ifelse会在dplyr 0.5.0 mutate中创建不正确的结果？

2 个答案: