Mutate重复第一行值

时间:2017-08-14 07:41:15

标签: r dataframe dplyr tidyverse mutate

我有一个带分类学分配的数据集,我想在新专栏中提取该属。

ActiveWorkbook.SaveAs Filename:=strName & ".csv", FileFormat:=xlUnicodeText

所以这是我的功能(它有效)

library(tidyverse)
library(magrittr)
library(stringr)


df <- structure(list(C043 = c(18361L, 59646L, 27575L, 163L, 863L, 3319L, 
                              0L, 6L), C057 = c(20020L, 97610L, 13427L, 1L, 161L, 237L, 2L, 
                                                105L), taxonomy = structure(c(3L, 2L, 1L, 6L, 4L, 4L, 5L, 2L), .Label = c("k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Enterobacter;NA", 
                                                                                                                          "k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Enterobacter;s__cloacae", 
                                                                                                                          "k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Escherichia;s__coli", 
                                                                                                                          "k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Klebsiella;s__", 
                                                                                                                          "k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas;s__", 
                                                                                                                          "k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas;s__stutzeri"
                                                ), class = "factor")), .Names = c("C043", "C057", "taxonomy"), row.names = c(1L, 
                                                                                                                             2L, 3L, 4L, 5L, 6L, 8L, 10L), class = "data.frame")

但是当我在extract_genus <- function(str){ genus <- str_split(str, pattern = ";")[[1]][6] genus %<>% str_sub(start = 4) #%>% as.character return(genus) } (有或没有mutate)中应用它时,它会在新列中重复第一行值。

as.character

当我使用df %>% mutate(genus = extract_genus(taxonomy)) C043 C057 taxonomy genus 1 18361 20020 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Escherichia;s__coli Escherichia 2 59646 97610 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Enterobacter;s__cloacae Escherichia 3 27575 13427 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Enterobacter;NA Escherichia 4 163 1 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas;s__stutzeri Escherichia 5 863 161 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Klebsiella;s__ Escherichia (但我不想,我想要一个带有sapply管道的解决方案)时,它可以正常工作。

dplyr

为什么df_group_gen$genus <- sapply(df_group_gen$taxonomy, extract_genus) C043 C057 taxonomy genus 1 18361 20020 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Escherichia;s__coli Escherichia 2 59646 97610 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Enterobacter;s__cloacae Enterobacter 3 27575 13427 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Enterobacter;NA Enterobacter 4 163 1 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas;s__stutzeri Pseudomonas 5 863 161 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Klebsiella;s__ Klebsiella 无法按照我们的预期进行计算?我发现这个question但没有提供答案,只有有特定的代码。

谢谢:)

1 个答案:

答案 0 :(得分:2)

你可以Vectorize你的函数允许mutate在每一行上发生:

ex_gen <- Vectorize(extract_genus, vectorize.args='str')

df %>% mutate(genus=ex_gen(taxonomy))

或者,您可以使用rowwise到每行mutate

df %>% 
    rowwise() %>% 
    mutate(genus = extract_genus(taxonomy))