使用transform()和colsplit()

时间:2017-12-29 02:19:35

标签: r dataframe

我有一个df:

Name    Letter
 1      A;B;C;D;E
 2      A;B;C;
 3      A;
 4      A;B;C;D;E

我使用以下代码制作一个df,其中每个字母使用以下内容拆分为自己的列:

library(reshape2)

new_df = transform(df, taxa = colsplit(Letter, split = ";", names = c("A", "B", "C", "D", "E"))) 

当我这样做时,我得到一个新的df,看起来像:

Name    .A   .B   .C   .D   .E
  1     A    B    C    D    E
  2     A    B    C    C    C
  3     A    A    A    A    A
  4     A    B    C    D    E

我如何做到这一点,以便缺少的字母不会被前一个字母取代,而是被特定的指定者取代,例如" unclassified"所以

Name    .A   .B   .C   .D   .E        
   2     A    B    C    C    C

变为:

Name    .A   .B   .C       .D       .E
   2     A    B    C    unclass  unclass

2 个答案:

答案 0 :(得分:2)

我们可以使用cSplit包中的splitstackshape函数。之后,将NA替换为" unclass"。

library(splitstackshape)

df2 <- cSplit(df, "Letter", sep = ";", type.convert = FALSE)

df2[is.na(df2)] <- "unclass"

df2
#    Name Letter_1 Letter_2 Letter_3 Letter_4 Letter_5
# 1:    1        A        B        C        D        E
# 2:    2        A        B        C  unclass  unclass
# 3:    3        A  unclass  unclass  unclass  unclass
# 4:    4        A        B        C        D        E

数据

df <- read.table(text = "Name    Letter
 1      A;B;C;D;E
 2      A;B;C;
 3      A;
 4      A;B;C;D;E",
                 header = TRUE, stringsAsFactors = FALSE)

答案 1 :(得分:1)

对于tidyverse风格方法,我提供:

library(tidyr)
library(dplyr)
library(purrr)
library(tibble)

df <- tribble(
  ~name, ~letter,
  1, "A;B;C;D;E",
  2, "A;B;C;E",
  3, "A;",
  4, "A;B;C;D;E",
  5, "D;A;C"
)

df %>%
  mutate(letter = strsplit(letter, ";")) %>%
  unnest %>%
  spread(letter, -name) %>%
  imap_dfr(~case_when(
    .y == "name" ~ as.character(.x),
    is.na(.x) ~ "unclass",
    TRUE ~ .y
  ))

# # A tibble: 5 x 6
#   name  A     B       C       D       E      
#   <chr> <chr> <chr>   <chr>   <chr>   <chr>  
# 1 1     A     B       C       D       E      
# 2 2     A     B       C       unclass E      
# 3 3     A     unclass unclass unclass unclass
# 4 4     A     B       C       D       E      
# 5 5     A     unclass C       D       unclass
  

NB 这里的主要好处是,当序列中存在间隙或者序列无序时,会尊重列位置,请在name == 2 {{1}时查看更改后的值{}}和A;B;C;Ename == 5