我在R
中有一个字符向量,每个元素都包含一个字符串 - 让我们使用这个例子:
my.files <- c("AWCallibration#NoneBino-3", "AWExperiment1#NoneBino-1", "AWExperiment2#NonemonL-2"
)
我想从这些字符串中提取某些信息 -
"AW"
)"Callibration"
)还是数据收集 - 如果是后者,使用了哪种条件("Experiment1"
或"Experiment2"
)"Bino"
或"monL"
)"1"
或"2"
)我首先尝试使用strsplit
,但这似乎仅适用于常规分隔符(例如"_"
)的情况。 substring
似乎更适合我的需求,但实际上并没有工作,因为分割不会发生在常规位置("Experiment1"
长度为11个元素,"Callibration"
为12个)。
我怀疑使用正则表达式可能就是答案,但我不知道如何解释分割之间的不同长度。
答案 0 :(得分:5)
您可以逐一提取信息:
first <- substr(my.files, 1, 2)
# [1] "AW" "AW" "AW"
second <- sub("^..(.*)#.*", "\\1", my.files)
# [1] "Callibration" "Experiment1" "Experiment2"
third <- sub("^.*#None(.*)-\\d+$", "\\1", my.files)
# [1] "Bino" "Bino" "monL"
fourth <- sub(".*-(\\d+)$", "\\1", my.files)
# [1] "3" "1" "2"
一站式命令:
strsplit(my.files, "(?<=^..)(?=[A-Z])|#None|-", perl = TRUE)
# [[1]]
# [1] "AW" "Callibration" "Bino" "3"
#
# [[2]]
# [1] "AW" "Experiment1" "Bino" "1"
#
# [[3]]
# [1] "AW" "Experiment2" "monL" "2"
答案 1 :(得分:5)
以下是一些不同的解决方案:
gsubfn :: strapplyc 试试这个:
library(gsubfn)
pat <- "(..)(.*)#None(.*)-(.*)"
strapplyc(my.files, pat, simplify = rbind)
给出:
[,1] [,2] [,3] [,4]
[1,] "AW" "Callibration" "Bino" "3"
[2,] "AW" "Experiment1" "Bino" "1"
[3,] "AW" "Experiment2" "monL" "2"
注意在gsubfn package的开发版本中,有一个read.pattern
命令可以使用上述pat
,如下所示:read.pattern(text = my.files, pattern = pat, as.is = TRUE)
sub / strsplit 这是另一种解决方案。它在第二个字符后插入一个减号,然后按减号或#None
:
my.files2 <- sub("(..)", "\\1-", my.files)
do.call(rbind, strsplit(my.files2, "-|#None"))
给出:
[,1] [,2] [,3] [,4]
[1,] "AW" "Callibration" "Bino" "3"
[2,] "AW" "Experiment1" "Bino" "1"
[3,] "AW" "Experiment2" "monL" "2"
gsub / read.table 在这里,我们使用gsub
在前两个字符后插入一个减号,同时我们用减号替换#None
。然后我们使用带有read.table
减号的sep
来阅读:
withMinus <- gsub("^(..)|#None", "\\1-", my.files)
read.table(text = withMinus, sep = "-", as.is = TRUE)
V1 V2 V3 V4
1 AW Callibration Bino 3
2 AW Experiment1 Bino 1
3 AW Experiment2 monL 2
修订: