CSV中带有特殊字符的正则表达式,带引号匹配

时间:2015-03-23 09:10:01

标签: asp.net regex parsing csv c#-4.0

我试过,这是我能想到的最好的,需要一个新的眼光或帮助完成这项工作。

表达式:

\"[a-zA-Z\s0-9\.\']*\"

输入字符串数据:

BOL,"AWBH0876356","HMM","H0010","BEANR","BEANR","AEJEA","BHBAH","","","T","S","","","F","N","","FCL/FCL","BE","","","","","","","","SUNNYLAND DISTRIBUTION NV","EVERDONGENLAAN 12 2300 TURNHOUT","","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","","","","","","N/A","770000",""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID","1650","CARTONS","CTN","","","1","1","2.2","21615.0","23815.0","0","0","0","0","","",""

我需要忽略第一个单词(BOL)和逗号,这是有效的,但我仍然坚持使用具有特殊字符('")的匹配。

匹配是一个问题,例如:

""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID"

2 个答案:

答案 0 :(得分:2)

你的正则表达式(以及要解析的字符串)的当前问题是它不接受值内有引号并且字符串有。也许,您可以指定内部可以有引号,但是,结束引号应该只在逗号之前或在字符串的结尾处,并且您可以使用正面看起来:

".*?"(?=,|$)

regex101 demo

".*?"匹配值,(?=,|$)确保其后面有逗号或字符串结尾(由$表示)。

请注意,如果您的字符串的值包含引号字符后跟逗号,则上述正则表达式无法正常工作。

在这种情况下,我通常做的是计算匹配数。如果这超过了我期望的数字,我会单独放置原始行,这样我就可以逐个查看它们(这将涉及一些人工干预,但这比最终出现大量错误更好!)。

如果所有问题都来自单个列,那么您可以更改脚本以便它可以合并'从 i j 的值( i 是第一个问题发生的列号, j 是下一个问题的列号),直到有适当数量的值。

答案 1 :(得分:0)

(?:^|(?<=,))"(?!,")(.+?)"(?=,"|$)

试试这个。看看演示。

https://regex101.com/r/tJ2mW5/4