stringsplit输出为新的colnames

时间:2016-03-16 07:45:56

标签: r

我想为我的数据框MirAligner创建新的合欢名,其中包含原始合并名中第一个_之前的部分。这就是我试过的:

unlist(strsplit(as.character(colnames(MirAligner)),'_',fixed=TRUE))

列名

head(colnames(MirAligner))
[1] "na-008_S52_L003_R1_001.mir.fa.gz"  "na-014_S99_L005_R1_001.mir.fa.gz" "na015_S114_L005_R1_001.mir.fa.gz" [4] "na-015_S50_L003_R1_001.mir.fa.gz"  "na-018_S147_L007_R1_001.mir.fa.gz" "na020_S162_L007_R1_001.mir.fa.gz"

预期产出:

na-008 na-014 na015

3 个答案:

答案 0 :(得分:5)

我们可以使用sub

sub('_.*', '', str1)
#[1] "na-014" "na015"  "na-015" "na-018" "na020" 

数据

str1 <- c("na-014_S99_L005_R1_001.mir.fa.gz", 
          "na015_S114_L005_R1_001.mir.fa.gz", 
          "na-015_S50_L003_R1_001.mir.fa.gz",  
          "na-018_S147_L007_R1_001.mir.fa.gz", 
          "na020_S162_L007_R1_001.mir.fa.gz")

答案 1 :(得分:3)

gsub("^(.*?)_.*", "\\1", try5)
#[1] "na-008" "na-014" "na015" 

答案 2 :(得分:3)

sapply 中使用 strsplit

#myColNames <- colnames(MirAligner)
myColNames <- c("na-008_S52_L003_R1_001.mir.fa.gz", "na-014_S99_L005_R1_001.mir.fa.gz")

sapply(strsplit(myColNames, "_", fixed = TRUE), "[[", 1)
#output
# [1] "na-008" "na-014"

或使用 read.table

read.table(text = myColNames, sep = "_", stringsAsFactors = FALSE)[, "V1"]