我有一个df:
Name Letter
1 A;B;C;D;E
2 A;B;C;
3 A;
4 A;B;C;D;E
我使用以下代码制作一个df,其中每个字母使用以下内容拆分为自己的列:
library(reshape2)
new_df = transform(df, taxa = colsplit(Letter, split = ";", names = c("A", "B", "C", "D", "E")))
当我这样做时,我得到一个新的df,看起来像:
Name .A .B .C .D .E
1 A B C D E
2 A B C C C
3 A A A A A
4 A B C D E
我如何做到这一点,以便缺少的字母不会被前一个字母取代,而是被特定的指定者取代,例如" unclassified"所以
Name .A .B .C .D .E
2 A B C C C
变为:
Name .A .B .C .D .E
2 A B C unclass unclass
答案 0 :(得分:2)
我们可以使用cSplit
包中的splitstackshape
函数。之后,将NA
替换为" unclass"。
library(splitstackshape)
df2 <- cSplit(df, "Letter", sep = ";", type.convert = FALSE)
df2[is.na(df2)] <- "unclass"
df2
# Name Letter_1 Letter_2 Letter_3 Letter_4 Letter_5
# 1: 1 A B C D E
# 2: 2 A B C unclass unclass
# 3: 3 A unclass unclass unclass unclass
# 4: 4 A B C D E
数据强>
df <- read.table(text = "Name Letter
1 A;B;C;D;E
2 A;B;C;
3 A;
4 A;B;C;D;E",
header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:1)
对于tidyverse
风格方法,我提供:
library(tidyr)
library(dplyr)
library(purrr)
library(tibble)
df <- tribble(
~name, ~letter,
1, "A;B;C;D;E",
2, "A;B;C;E",
3, "A;",
4, "A;B;C;D;E",
5, "D;A;C"
)
df %>%
mutate(letter = strsplit(letter, ";")) %>%
unnest %>%
spread(letter, -name) %>%
imap_dfr(~case_when(
.y == "name" ~ as.character(.x),
is.na(.x) ~ "unclass",
TRUE ~ .y
))
# # A tibble: 5 x 6
# name A B C D E
# <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 A B C D E
# 2 2 A B C unclass E
# 3 3 A unclass unclass unclass unclass
# 4 4 A B C D E
# 5 5 A unclass C D unclass
NB 这里的主要好处是,当序列中存在间隙或者序列无序时,会尊重列位置,请在
name == 2
{{1}时查看更改后的值{}}和A;B;C;E
与name == 5
。