正则表达式拆分字符串保留引号

时间:2011-01-24 10:03:08

标签: c# regex split

我需要根据空格作为分隔符拆分下面的字符串。但是应该保留引文中的任何空格。

research library "not available" author:"Bernard Shaw"

research
library
"not available"
author:"Bernard Shaw"

我想在C Sharp做这个,我有这个正则表达式:来自SO中另一个帖子的@"(?<="")|\w[\w\s]*(?="")|\w+|""[\w\s]*""",它将字符串拆分为

research
library
"not available"
author
"Bernard Shaw"

遗憾的是不符合我的确切要求。

我正在寻找任何可以解决问题的正则表达式。

任何帮助表示感谢。

2 个答案:

答案 0 :(得分:27)

只要引用的字符串中没有引用的转义,以下内容应该有效:

splitArray = Regex.Split(subjectString, "(?<=^[^\"]*(?:\"[^\"]*\"[^\"]*)*) (?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

此正则表达式仅在空格字符前面和后面跟偶数引号时才会分割。

没有所有那些转义引号的正则表达式解释了:

(?<=      # Assert that it's possible to match this before the current position (positive lookbehind):
 ^        # The start of the string
 [^"]*    # Any number of non-quote characters
 (?:      # Match the following group...
  "[^"]*  # a quote, followed by any number of non-quote characters
  "[^"]*  # the same
 )*       # ...zero or more times (so 0, 2, 4, ... quotes will match)
)         # End of lookbehind assertion.
[ ]       # Match a space
(?=       # Assert that it's possible to match this after the current position (positive lookahead):
 (?:      # Match the following group...
  [^"]*"  # see above
  [^"]*"  # see above
 )*       # ...zero or more times.
 [^"]*    # Match any number of non-quote characters
 $        # Match the end of the string
)         # End of lookahead assertion

答案 1 :(得分:3)

你走了:

C#:

Regex.Matches(subject, @"([^\s]*""[^""]+""[^\s]*)|\w+")

正则表达式:

([^\s]*\"[^\"]+\"[^\s]*)|\w+