如何使用case_when而不是if_else [我的代码错误?]

时间:2019-09-05 14:45:31

标签: r dplyr vectorization tidyverse case-when

我正在尝试理解为什么我不能使用#include<iostream> #include<stack> using namespace std; int setTowers(std::stack<int>& src) { int disks; std::cout << "Enter number of disks: "; std::cin >> disks; for (int i = 0; i < disks; i++) src.push(disks - i); return disks; } void printRod(std::stack<int>tower) { std::stack<int>temp; while (!tower.empty()) { temp.push(tower.top()); tower.pop(); } if (!temp.size()) std::cout << "empty rod\n"; else { while (!temp.empty()) { std::cout << temp.top(); temp.pop(); if (temp.size() != 0) std::cout << ", "; } std::cout << '\n'; } } void PrintRods(std::stack<int>& src, std::stack<int>& dest, std::stack<int>& spare) { std::cout << "Source: "; printRod(src); std::cout << "Destination: ", printRod(dest); std::cout << "Spare: ", printRod(spare); std::cout << "----------------------------\n"; } void MoveDisks(int disks, std::stack<int>& src, std::stack<int>& dest, std::stack<int>& spare,std::stack<int>&a,std::stack<int>&b,std::stack<int>&c,unsigned& stepsTaken) { if (disks < 1) return; else if (disks == 1) { stepsTaken++; dest.push(src.top()); src.pop(); std::cout << "Step #" << stepsTaken << ": Moved disk " << disks << '\n'; PrintRods(a,b,c); return; } else { MoveDisks(disks - 1, src, spare, dest,a,b,c,stepsTaken); stepsTaken++; dest.push(src.top()); src.pop(); std::cout << "Step #" << stepsTaken << ": Moved disk " << disks << '\n'; PrintRods(a,b,c); MoveDisks(disks - 1, spare, dest, src, a,b,c,stepsTaken); } } int main() { std::stack<int> src; std::stack<int> dest; std::stack<int> spare; unsigned stepsTaken(0); int disks = setTowers(src); PrintRods(src, dest, spare); MoveDisks(disks, src, dest, spare,src,dest,spare,stepsTaken); return 0; } 而不是dplyr::case_when的原因。 可能我缺少了一些东西。让我解释一下:

我得到了运行良好的操作:

dplyr::if_else

但是,当我尝试以这种方式使用df %>% mutate( keep = if_else( assembly_level != "Complete Genome" | genome_rep != "Full", FALSE, ifelse( version_status == "suppressed", FALSE, if_else( refseq_category %in% c("reference genome", "representative genome"), TRUE, if_else( rpseudo > 0.4, FALSE, TRUE ) ) ) ) )

case_when

我得到了不同的结果。

我认为问题只是该功能的使用。

如果需要数据,它是一般公共数据,可以在这里下载:ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt

获得:

df %>%
  mutate(
    keep = case_when(
      assembly_level != "Complete Genome" | genome_rep != "Full" ~ FALSE, 
      version_status == "suppressed" ~ FALSE,
      refseq_category %in% c("reference genome", "representative genome") ~ TRUE, 
      rpseudo > 0.4 ~ FALSE,
      TRUE ~ TRUE
    )
  )

预先感谢

1 个答案:

答案 0 :(得分:1)

数据中有NA。将if_else的输出存储在df1中,将带有case_when的输出存储在df2中。 df1$keepdf2$keep之间的唯一区别是df1$keep里面只有很少的NA,在那个地方case_when有一些实际价值。检查

table(df1$keep, useNA = "always")
# FALSE   TRUE   <NA> 
#156616  10386     79 

table(df2$keep, useNA = "always")
# FALSE   TRUE   <NA> 
#156647  10434      0 

如果可以的话

(156647 - 156616) + (10434 - 10386) #It gives exactly
#[1] 79

另外,如果您删除了这些NA值,然后检查df1df2中的值,则它们是相同的。

all(df1$keep[!is.na(df1$keep)] == df2$keep[!is.na(df1$keep)])
#[1] TRUE

NAif_else中处理case_when的方式是不同的。考虑这个简化的示例,以便更好地理解。

library(dplyr)
df <- data.frame(a = c(1:3, NA, 4:7), b = c(NA, letters[1:7]))

现在让我们创建一些随机条件进行测试。使用if_else

df %>%
  mutate(res = if_else(a > 3, "Yes", 
                   if_else(b == "c", "No", 
                           if_else(a > 5, "Maybe", "Done"))))

#   a    b  res
#1  1 <NA> <NA>
#2  2    a Done
#3  3    b Done
#4 NA    c <NA>
#5  4    d  Yes
#6  5    e  Yes
#7  6    f  Yes
#8  7    g  Yes

但是,使用case_when时,您得到的输出为

df %>%
   mutate(res = case_when(a > 3 ~ "Yes", 
                          b == "c"~"No", 
                          a > 5 ~ "Maybe", 
                          TRUE ~ "Done"))

#   a    b  res
#1  1 <NA> Done
#2  2    a Done
#3  3    b Done
#4 NA    c   No
#5  4    d  Yes
#6  5    e  Yes
#7  6    f  Yes
#8  7    g  Yes

因此,如果您在if_else中注意到是否遇到NA,它将立即返回NA。但是,在case_when中,它将NA视为FALSE,因此,如果遇到NA,它将进入下一个条件,直到满足任何条件,否则返回值TRUE

数据

set.seed(1234)
read_tsv("ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt",
comment = "#",
col_names = c(
       "assembly", "bioproject", "biosample", 
       "wgs_master", "refseq_category", "taxid", 
       "species_taxid", "organism_name", "infraspecific_name", 
       "isolate", "version_status", "assembly_level", 
       "release_type", "genome_rep", "seq_rel_date", 
       "asm_name", "submitter", "gbrs_paired_asm", 
       "paired_asm_comp", "ftp_path", "excluded_from_refseq", "relation_to_type_material"
     )
) %>%
select(assembly, refseq_category, 
      assembly_level, genome_rep, 
     version_status, release_type) %>%
 mutate(
  rpseudo = runif(nrow(.), 0, 1)
 ) -> df