将R中的字符串数据处理为特定格式

时间:2017-08-24 03:56:38

标签: r regex string

我有一大块文字:

test<-"ABC (0033 - test), CCPM (0431 - CCPM), FGC (0432 -YYY)"

我想操纵它们,使它成为:

 ABC (0033 - test),0033       
 CCPM (0431 - CCPM),0431
 FGC (0432 -YYY),0432

我怎么能这样做?

1 个答案:

答案 0 :(得分:0)

我们可以使用gsub

gsub("(\\d+)([^)]+\\))", "\\1\\2,\\1", test)
[1] "ABC (0033 - test),0033, CCPM (0431 - CCPM),0431, FGC (0432 -YYY),0432"

如果我们需要打印

cat(gsub("(\\d+)([^)]+\\)),*", "\\1\\2,\\1\n", test), sep="\n")
# ABC (0033 - test),0033
# CCPM (0431 - CCPM),0431
# FGC (0432 -YYY),0432

如果我们需要一个带有3个独立元素的向量

unname(sapply(strsplit(test, ", ")[[1]], function(x) 
    paste(x, regmatches(x, regexpr("\\d+", x)), sep=",")))
#[1] "ABC (0033 - test),0033"  "CCPM (0431 - CCPM),0431" "FGC (0432 -YYY),0432"