Question

df <- data.frame(category = c("X", "Y"), sequence = c("AAT.G", "CCG-T"), stringsAsFactors = FALSE)

df
 category sequence
1        X     AAT.G
2        Y     CCG-T

我想将列sequence分成5列（每个字符一列）。我尝试使用tidyr::separate执行此操作，但它在内部使用stringi::stri_split_regex，它不接受空字符串作为分隔符（尽管sep参数应采用正则表达式。）

library(tidyr)
separate(df, sequence, into = paste0("V", 1:5), sep="")

Error: Values not split into 5 pieces at 1, 2
In addition: Warning messages:
1: In stringi::stri_split_regex(value, sep, n_max) :
  empty search patterns are not supported
2: In stringi::stri_split_regex(value, sep, n_max) :
  empty search patterns are not supported

预期输出如下：

  category V1 V2 V3 V4 V5
1        X  A  A  T  .  G
2        Y  C  C  G  -  T

Answer 1

您可以使用extract

中的tidyr执行此操作

library(tidyr)
extract(df, sequence, into=paste0('V', 1:5), '(.)(.)(.)(.)(.)')
#  category V1 V2 V3 V4 V5
#1        X  A  A  T  .  G
#2        Y  C  C  G  -  T

或使用gsub创建分隔符，并将其用作sep的{{1}}

separator

或者您可以使用library(dplyr) library(tidyr) df %>% mutate(sequence=gsub('(?<=.)(?=.)', ',', sequence, perl=TRUE)) %>% separate(sequence, into=paste0('V', 1:5), sep=",") # category V1 V2 V3 V4 V5 #1 X A A T . G #2 Y C C G - T

cSplit

Answer 2

sep可以是整数向量。使用sep=1:4就足够了，但是5个也可以，而且看起来更好一些。

df %>% separate(sequence, into = paste0("V", 1:5), sep = 1:5)

给予：

  category V1 V2 V3 V4 V5
1        X  A  A  T  .  G
2        Y  C  C  G  -  T

使用tidyr将一列分隔成多列：使用sep =“”分隔

2 个答案: