R strsplit:基于字符拆分,除非特定字符跟随

时间:2016-03-25 03:26:19

标签: regex r string split

假设我有一个像

这样的字符串向量
split_these = c("File Location:C:\\Documents","File Location:Pete's Computer","File Location:") 

我想基于“:”拆分此向量中的每个元素,除非后面跟着“\”。我想要的是返回类似

的东西
#preferred solution
"File Location" "C:\\Documents"
"File Location" "Pete's Computer"
"File Location" ""

#less preferred but still great
"File Location" "C:\\Documents"
"File Location" "Pete's Computer"
"File Location" 

我尝试了以下

strsplit(split_these, ":")
[[1]]
[1] "File Location" "C"             "\\Documents"  

[[2]]
[1] "File Location" "Pete Computer"

[[3]]
[1] "File Location"

strsplit(split_these, ":[^\\]")
[[1]]
[1] "File Location" ":\\Documents" 

[[2]]
[1] "File Location" "ete Computer" 

[[3]]
[1] "File Location:"

2 个答案:

答案 0 :(得分:3)

我建议使用具有负前瞻断言的PCRE。另请注意,您需要对反斜杠进行双重转义,因为它在R字符串和正则表达式语法中都充当元字符。

strsplit(perl=T,split_these,':(?!\\\\)');
## [[1]]
## [1] "File Location" "C:\\Documents"
##
## [[2]]
## [1] "File Location"   "Pete's Computer"
##
## [[3]]
## [1] "File Location"

如果要将列表简化为单个字符向量:

do.call(c,strsplit(perl=T,split_these,':(?!\\\\)'));
## [1] "File Location" "C:\\Documents" "File Location" "Pete's Computer" "File Location"

我想出了一个黑客来获取尾随的空字符串字段。由于strsplit()总是省略最后的空字段,我们可以简单地将分隔符连接到每个输入字符串的末尾。如果原始字符串中没有尾随分隔符,则将省略新的空字段,而不更改结果。如果 是原始字符串中的尾随分隔符,那么我们将得到我们想要的空字段:

do.call(c,strsplit(perl=T,paste0(split_these,':'),':(?!\\\\)'));
## [1] "File Location" "C:\\Documents" "File Location" "Pete's Computer" "File Location" ""

答案 1 :(得分:0)

使用split_these迭代read.dcf的元素会给出一个可以重新转换为data.frame的命名字符向量:

v <- drop(do.call("cbind", lapply(split_these, function(x) read.dcf(textConnection(x)))))

,并提供:

> v
    File Location     File Location     File Location 
  "C:\\Documents" "Pete's Computer"                "" 

> stack(v)[2:1]

,并提供:

            ind          values
1 File Location   C:\\Documents
2 File Location Pete's Computer
3 File Location