如何使用str_split_fixed分割具有多个分隔符的数据框?

时间:2016-06-16 14:50:16

标签: r stringr

如何将由多个分隔符分隔的列拆分为数据框中的单独列

read.table(text = " Chr  Nm1 Nm2 Nm3
    chr10_100064111-100064134+Nfif   20  20 20
    chr10_100064115-100064138-Kitl   30 19 40
    chr10_100076865-100076888+Tert   60 440 18
    chr10_100079974-100079997-Itg    50 11 23                
    chr10_100466221-100466244+Tmtc3  55 24 53", header = TRUE)


              Chr              gene   Nm1 Nm2 Nm3
    chr10_100064111-100064134 Nfif   20  20 20
    chr10_100064115-100064138 Kitl   30 19 40
    chr10_100076865-100076888 Tert   60 440 18
    chr10_100079974-100079997 Itg    50 11 23 12                
    chr10_100466221-100466244 Tmtc3  55 24 53 12

我用过

library(stringr)
df2 <- str_split_fixed(df1$name, "\\+", 2)

我想知道如何包含+和 - 分隔符

3 个答案:

答案 0 :(得分:5)

如果您尝试将一列拆分为多个,tidyr::separate很方便:

library(tidyr)

dat %>% separate(Chr, into = paste0('Chr', 1:3), sep = '[+-]')

#              Chr1      Chr2  Chr3 Nm1 Nm2 Nm3
# 1 chr10_100064111 100064134  Nfif  20  20  20
# 2 chr10_100064115 100064138  Kitl  30  19  40
# 3 chr10_100076865 100076888  Tert  60 440  18
# 4 chr10_100079974 100079997   Itg  50  11  23
# 5 chr10_100466221 100466244 Tmtc3  55  24  53

答案 1 :(得分:2)

以下是使用# split Chr into a list tempList <- strsplit(as.character(df$Chr), split="[+-]") # replace Chr with desired values df$Chr <- sapply(tempList, function(i) paste(i[[1]], i[[2]], sep="-")) # get Gene variable df$gene <- sapply(tempList, "[[", 3) 在基础R中执行此操作的方法:

.....
  <!-- compiled CSS --><% styles.forEach( function ( file ) { %>
  <link rel="stylesheet" type="text/css" href="<%= file %>" /><% }); %>

  <!-- compiled JavaScript --><% scripts.forEach( function ( file ) { %>
  <script type="text/javascript" src="<%= file %>"></script><% }); %>
.....

答案 2 :(得分:1)

这应该有效:

str_split_fixed(a, "[-+]", 2)