我正在尝试使用正则表达式从(C#)字符串中提取引号包装的字符串,该字符串是以逗号分隔的此类字符串列表。我需要提取所有正确引用的子串,并忽略那些缺少引号的那些
例如,给这个字符串
"动物,狗,猫","大肠杆菌,验证,"链球菌"
我需要提取动物,狗,猫等#34;和"链球菌"。
我已经在这个论坛中尝试了各种正则表达式解决方案,但他们似乎都只找到了第一个子字符串,或者错误地匹配了#ecoli,验证,"并忽略"链球菌"
这可以解决吗?
TIA
答案 0 :(得分:2)
试试这个:
string input = "\"animal,dog,cat\",\"ecoli, verification,\"streptococcus\"";
string pattern = "\"([^\"]+?[^,])\"";
var matches = Regex.Matches(input, pattern);
foreach (Match m in matches)
Console.WriteLine(m.Groups[1].Value);
P.S。但我同意评论员的观点:修复来源。
答案 1 :(得分:1)
我建议:
"(?>[^",]*(?>,[^",]+)*)"
<强>解释强>
" # Match a starting quote
(?> # Capture in an atomic group to avoid catastrophic backtracking:
[^",]* # - any number of characters except commas or quotes
(?> # - optionally followed by another (atomic) group:
, # - which starts with a comma
[^",]+ # - and contains at least one character besides comma or quotes.
)* # - (as said above, that group is optional but may occur many times)
) # End of the outer atomic group
" # Match a closing quote