将dplyr mutate函数与整个表的搜索结合起来

时间:2019-04-11 11:11:26

标签: r tidyverse

我对R特别是整洁的诗歌很陌生。我正在尝试编写一个脚本,通过它可以重写分类单元列表。我们已经有很多使用for和if循环的方法了,我想尝试使用tidyverse来简化它,但是我有点执着于做到这一点。

我所拥有的是一张看起来像这样的表(确实简化了)

taxon_file<- tibble(name = c( "cockroach","cockroach2", "grasshopper", "spider",    "lobster",  "insect",   "crustacea",    "arachnid"), 
                Id = c(445,448,446,778,543,200,400,300),
                parent_ID = c(200,200,200,300,400,200,400,300),
                rank = c("genus","genus","genus","genus","genus","order","order","order")
                )    


+-------------+-----+-----------+----------+
|    name     | Id  | parent_ID |   rank   |
+=============+=====+===========+==========+
| cockroach   | 445 | 200       | genus    |
| cockroach2  | 448 | 200       | genus    |
| grasshopper | 446 | 200       | genus    |
| spider      | 778 | 300       | genus    |
| lobster     | 543 | 400       | genus    |
| insect      | 200 | 200       | order    |
| crustacea   | 400 | 400       | order    |
| arachnid    | 300 | 300       | order    |
+-------------+-----+-----+------------+----------+

现在,我想重新排列它,以便获得一个新列,在其中可以添加与parent_ID匹配的顺序(因此,当parent_ID == ID时,请按列顺序写入名称)。最终结果应该看起来像这样

+-------------+------------+------+-----------+
|    name     |    order   |  Id  | parent_ID |
+=============+============+======+===========+
| cockroach   |  insect    |  445 |       200 |
| cockroach2  |  insect    |  448 |       200 |
| grasshopper |  insect    |  446 |       200 |
| spider      |  arachnid  |  778 |       300 |
| lobster     |  crustacea |  543 |       400 |
+-------------+------------+------+-----------+

我试图将mutate与ifelse语句组合在一起,但这只是将NA添加到整个order列中。

小玩意被命名为taxon_list

taxon_list %>%    
   mutate(order = ifelse(parent_ID == Id, Name, NA))

我知道这是行不通的,因为它不会在整个数据集中搜索正确的行(这是我之前使用all的for循环所做的事情)。也许有人可以指出我正确的方向?

1 个答案:

答案 0 :(得分:0)

一种方法是将每种等级类型的filter设置为2个单独的df,使用selectmerge的2作为子集。

  df <- tibble(name = c( "cockroach","cockroach2", "grasshopper", "spider",    "lobster",  "insect",   "crustacea",    "arachnid"), 
                  Id = c(445,448,446,778,543,200,400,300),
                  parent_ID = c(200,200,200,300,400,200,400,300),
                  rank = c("genus","genus","genus","genus","genus","order","order","order"))     

library(tidyverse)

df_order <- df %>%
  filter(rank == "order") %>% 
  select(order = name, parent_ID)

df_genus <- df %>%
  filter(rank == "genus") %>% 
  select(name, Id, parent_ID) %>% 
  merge(df_order, by = "parent_ID")

结果:

  parent_ID        name  Id     order
1       200   cockroach 445    insect
2       200  cockroach2 448    insect
3       200 grasshopper 446    insect
4       300      spider 778  arachnid
5       400     lobster 543 crustacea