我试过,这是我能想到的最好的,需要一个新的眼光或帮助完成这项工作。
表达式:
\"[a-zA-Z\s0-9\.\']*\"
输入字符串数据:
BOL,"AWBH0876356","HMM","H0010","BEANR","BEANR","AEJEA","BHBAH","","","T","S","","","F","N","","FCL/FCL","BE","","","","","","","","SUNNYLAND DISTRIBUTION NV","EVERDONGENLAAN 12 2300 TURNHOUT","","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","","","","","","N/A","770000",""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID","1650","CARTONS","CTN","","","1","1","2.2","21615.0","23815.0","0","0","0","0","","",""
我需要忽略第一个单词(BOL
)和逗号,这是有效的,但我仍然坚持使用具有特殊字符('
,"
)的匹配。
匹配是一个问题,例如:
""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID"
答案 0 :(得分:2)
你的正则表达式(以及要解析的字符串)的当前问题是它不接受值内有引号并且字符串有。也许,您可以指定内部可以有引号,但是,结束引号应该只在逗号之前或在字符串的结尾处,并且您可以使用正面看起来:
".*?"(?=,|$)
".*?"
匹配值,(?=,|$)
确保其后面有逗号或字符串结尾(由$
表示)。
请注意,如果您的字符串的值包含引号字符后跟逗号,则上述正则表达式无法正常工作。
在这种情况下,我通常做的是计算匹配数。如果这超过了我期望的数字,我会单独放置原始行,这样我就可以逐个查看它们(这将涉及一些人工干预,但这比最终出现大量错误更好!)。
如果所有问题都来自单个列,那么您可以更改脚本以便它可以合并'从 i 到 j 的值( i 是第一个问题发生的列号, j 是下一个问题的列号),直到有适当数量的值。
答案 1 :(得分:0)