YARQ(另一个正则表达式问题)。
我如何将以下内容分成两列,确保最后一列包含句子中的最后一个单词,第一列包含其他所有单词。
x <- c("This is a test",
"Testing 1,2,3 Hello",
"Foo Bar",
"Random 214274(%*(^(* Sample",
"Some Hyphenated-Thing"
)
这样我最终得到了:
col1 col2
this is a test
Testing 1,2,3 Hello
Foo Bar
Random 214274(%*(^(* Sample
Some Hyphenated-Thing
答案 0 :(得分:9)
这看起来像是一个前瞻性的工作。我们会找到空格,后面跟不是空格的东西。
split <- strsplit(x, " (?=[^ ]+$)", perl=TRUE)
matrix(unlist(split), ncol=2, byrow=TRUE)
[,1] [,2]
[1,] "This is a" "test"
[2,] "Testing 1,2,3" "Hello"
[3,] "Foo" "Bar"
[4,] "Random 214274(%*(^(*" "Sample"
[5,] "Some" "Hyphenated-Thing"
答案 1 :(得分:4)
使用strsplit
:
do.call(rbind,
lapply(
strsplit(x," "),
function(y)
cbind(paste(head(y,length(y)-1),collapse=" "),tail(y,1))
)
)
使用sapply
t(
sapply(
strsplit(x," "),
function(y) cbind(paste(head(y,length(y)-1),collapse=" "),tail(y,1))
)
)
导致:
[,1] [,2]
[1,] "This is a" "test"
[2,] "Testing 1,2,3" "Hello"
[3,] "Foo" "Bar"
[4,] "Random 214274(%*(^(*" "Sample"
[5,] "Some" "Hyphenated-Thing"
答案 2 :(得分:1)
假设“单词”是字母数字(在这种情况下,最后一个单词是一个或字母\\w
或数字\\d
,您可以根据需要添加更多类):
col_one = gsub("(.*)(\\b[[\\w\\d]+)$", "\\1", x, perl=TRUE)
col_two = gsub("(.*)(\\b[[\\w\\d]+)$", "\\2", x, perl=TRUE)
输出:
> col_one
[1] "This is a " "Testing 1,2,3 " "Foo "
[4] "Random 214274(%*(^(* "
> col_two
[1] "test" "Hello" "Bar" "Sample"
答案 3 :(得分:0)
这可能不适合您,但如果有人想知道如何在python中执行此操作:
#col1:
print line.split(" ")[:-1]
#col2:
print line.split(" ")[-1]
请注意,col1将打印为列表,您可以将其打印成如下字符串:
#col1:
print " ".join(line.split(" ")[:-1])