我有一个矢量数据(这是数据框中的一列):
[1] "Tue 12-14 (w1-6, CLB 6)" "Mon 18-20 (w1-6, ColomboThC)" "Thu 14-16 (w1-6,7-9,10-12, CLB 8)"
[4] "Fri 13 (w2-9,10-13, Law 388)" "Fri 14 (w2-9,10-13, Sqhouse206)" "Fri 15 (w2-9,10-13, Sqhouse115)"
[7] "Thu 17 (w2-9,10-13, Block G16)" "Thu 18 (w2-9,10-13, Block G16)" "Mon 10 (w2-9,10-13, AinswthG01)"
[10] "Mon 11 (w2-9,10-13, Sqhouse203)" "Mon 12 (w2-9,10-13, Sqhouse206)" "Mon 13 (w2-9,10-13, BUS 114)"
[13] "Mon 16 (w2-9,10-13, Gold G03)" "Mon 17 (w2-9,10-13, Quad G047)" "Mon 20 (w2-9,10-13, Col LG02)"
[16] "Tue 17 (w2-9,10-13, Quad 1001)" "Tue 18 (w2-9,10-13, Quad 1001)" "Tue 19 (w2-9,10-13, Quad 1001)"
[19] "Tue 20 (w2-9,10-13)" "Wed 10 (w2-9,10-13, Quad 1046)" "Wed 11 (w2-9,10-13, Quad 1046)"
[22] "Wed 12 (w2-9,10-13, Quad 1046)" "Wed 13 (w2-9,10-13, Quad G046)"
我想根据模式提取字符串,因此作为向量的第一个元素的预期输出将是:
"Tue" "12-14" "1-6" "CLB 6"
第三个元素的输出示例为:
"Thu" "14-16" c("1-6","7-9","10-12") "CLB 8"
其中c("1-6","7-9","10-12")
是一个列表。
(请注意,每个都将作为我的数据框中的新列附加。)
我正在考虑使用gsub
来提取字符串的每个部分。我可以使用其他功能吗?
非常感谢任何建议:)
答案 0 :(得分:2)
我们可以尝试tidyverse
:
library(tidyverse)
str_split_fixed(vec, pattern = " ", n = 3) %>%
as.data.frame() %>%
mutate(V3 = str_sub(V3,3,-2)) %>%
separate(V3, c("V3", "V4"), sep = ", ")
代码编写如下:
vec
按空格拆分为3列,并将其强制为数据帧。", "
。输出的一个例子:
V1 V2 V3 V4
1 Tue 12-14 1-6 CLB 6
2 Mon 18-20 1-6 ColomboThC
3 Thu 14-16 1-6,7-9,10-12 CLB 8
答案 1 :(得分:1)
在结尾的注释中使用输入x
:
它仅使用单独的简单步骤而不使用包。
y <- x
y <- sub(" ", ";", y)
y <- sub(" ..", ";", y)
y <- sub(", ", ";", y)
y <- sub(".$", "", y)
DF <- read.table(text = y, sep = ";", as.is = TRUE, fill = NA)
DF[[3]] <- strsplit(DF[[3]], ",")
,并提供:
> DF
V1 V2 V3 V4
1 Tue 12-14 1-6 CLB 6
2 Mon 18-20 1-6 ColomboThC
3 Thu 14-16 1-6, 7-9, 10-12 CLB 8
4 Fri 13 2-9, 10-13 Law 388
5 Fri 14 2-9, 10-13 Sqhouse206
6 Fri 15 2-9, 10-13 Sqhouse115
7 Thu 17 2-9, 10-13 Block G16
8 Thu 18 2-9, 10-13 Block G16
9 Mon 10 2-9, 10-13 AinswthG01
10 Mon 11 2-9, 10-13 Sqhouse203
11 Mon 12 2-9, 10-13 Sqhouse206
12 Mon 13 2-9, 10-13 BUS 114
13 Mon 16 2-9, 10-13 Gold G03
14 Mon 17 2-9, 10-13 Quad G047
15 Mon 20 2-9, 10-13 Col LG02
16 Tue 17 2-9, 10-13 Quad 1001
17 Tue 18 2-9, 10-13 Quad 1001
18 Tue 19 2-9, 10-13 Quad 1001
19 Tue 20 2-9, 10-13
20 Wed 10 2-9, 10-13 Quad 1046
21 Wed 11 2-9, 10-13 Quad 1046
可以用这一行替换前4行代码,在这种情况下,它总共减少到4行代码。
y <- Reduce(function(x, pat) sub(pat, ";", x), init = x, c(" ", " ..", ", "))
注意:可重复形式的输入x
为:
x <- c("Tue 12-14 (w1-6, CLB 6)", "Mon 18-20 (w1-6, ColomboThC)",
"Thu 14-16 (w1-6,7-9,10-12, CLB 8)", "Fri 13 (w2-9,10-13, Law 388)",
"Fri 14 (w2-9,10-13, Sqhouse206)", "Fri 15 (w2-9,10-13, Sqhouse115)",
"Thu 17 (w2-9,10-13, Block G16)", "Thu 18 (w2-9,10-13, Block G16)",
"Mon 10 (w2-9,10-13, AinswthG01)", "Mon 11 (w2-9,10-13, Sqhouse203)",
"Mon 12 (w2-9,10-13, Sqhouse206)", "Mon 13 (w2-9,10-13, BUS 114)",
"Mon 16 (w2-9,10-13, Gold G03)", "Mon 17 (w2-9,10-13, Quad G047)",
"Mon 20 (w2-9,10-13, Col LG02)", "Tue 17 (w2-9,10-13, Quad 1001)",
"Tue 18 (w2-9,10-13, Quad 1001)", "Tue 19 (w2-9,10-13, Quad 1001)",
"Tue 20 (w2-9,10-13)", "Wed 10 (w2-9,10-13, Quad 1046)", "Wed 11 (w2-9,10-13, Quad 1046)",
"Wed 12 (w2-9,10-13, Quad 1046)", "Wed 13 (w2-9,10-13, Quad G046)"
答案 2 :(得分:0)
基本功能可以做,您不需要导入任何其他包。 tidyverse是伟大的,但有点大,
x=c("Thu 18 (w2-9,10-13, Block G16)","Mon 18-20 (w1-6, ColomboThC)")
do.call(rbind,lapply(x,function(i){
y=strsplit(i,' \\(')[[1]]
y[2]=gsub('\\)','',y[2])
out1=strsplit(y[1],' ')[[1]]
out2=strsplit(y[2],', ')[[1]]
listpart=grepl('-',out2)
do.call(cbind,c(out1,list(out2[listpart]),out2[!listpart]))
}))
输出将是:
[,1] [,2] [,3] [,4]
[1,] "Thu" "18" "w2-9,10-13" "Block G16"
[2,] "Mon" "18-20" "w1-6" "ColomboThC"