我已经获得了一个非常大的CSV数据文件,我需要将其导入MySQL数据库。不幸的是,CSV文件在每50行数据之后有一个文本页脚,如下所示:
0,,,,,," of 2,401",,,,
10,,,,,," of 2,401",,,,
999,,,,,," of 2,401",,,,
"1,000",,,,,," of 2,401",,,,
"2,396",,,,,," of 2,401",,,,
...etc
正如您所看到的,当数字达到1,000时,模式会发生变化(他们开始使用双引号括起首页#)。这超出了我对RegEx的理解。我需要一个正则表达式来识别所有这些行并删除它们。
答案 0 :(得分:0)
尝试
(\d+|"(\d+,\d+)+"),+" of (\d+|(\d+,\d+)+)",+(\n|$)
它将匹配以下所有情况:
0 ,,,,,,“2,401”,,,,
10 ,,,,,,“2,401”,,,,
999 ,,,,,,“2,401”,,,,
“1,000”,,,,,,“2,401”,,,,
“2,396”,,,,,,“2,401”,,,,
10 ,,,,,,“2,401,000”,,,,
“1,999,822”,,,,,,“2,401,000”,,,,