正则表达式 - 如何匹配多个正确引用的子串

时间:2015-09-23 15:52:47

标签: c# regex

我正在尝试使用正则表达式从(C#)字符串中提取引号包装的字符串,该字符串是以逗号分隔的此类字符串列表。我需要提取所有正确引用的子串,并忽略那些缺少引号的那些

例如,给这个字符串

"动物,狗,猫","大肠杆菌,验证,"链球菌"

我需要提取动物,狗,猫等#34;和"链球菌"。

我已经在这个论坛中尝试了各种正则表达式解决方案,但他们似乎都只找到了第一个子字符串,或者错误地匹配了#ecoli,验证,"并忽略"链球菌"

这可以解决吗?

TIA

2 个答案:

答案 0 :(得分:2)

试试这个:

string input = "\"animal,dog,cat\",\"ecoli, verification,\"streptococcus\"";
string pattern = "\"([^\"]+?[^,])\"";

var matches = Regex.Matches(input, pattern);

foreach (Match m in matches)
    Console.WriteLine(m.Groups[1].Value);

P.S。但我同意评论员的观点:修复来源。

答案 1 :(得分:1)

我建议:

"(?>[^",]*(?>,[^",]+)*)"

<强>解释

"        # Match a starting quote
(?>      # Capture in an atomic group to avoid catastrophic backtracking:
 [^",]*  # - any number of characters except commas or quotes
 (?>     # - optionally followed by another (atomic) group:
  ,      #   - which starts with a comma
  [^",]+ #   - and contains at least one character besides comma or quotes.
 )*      # - (as said above, that group is optional but may occur many times)
)        # End of the outer atomic group
"        # Match a closing quote

测试live on regex101.com