具有报价和反斜杠的CSV的正则表达式

时间:2013-07-19 19:25:24

标签: java regex csv

我正在尝试查找适用于CSV文件的正则表达式(在值周围使用双引号),其中值可以包含任何字符。我现在使用的表达式是(在Java中,因此反斜杠被转义):

",(?=(([^\"\\\\]|\\\\.)*\"([^\"\\\\]|\\\\.)*\")*([^\"\\\\]|\\\\.)*$)"

我遇到的问题包括“random_value”或“random_value \”等条目。

其他信息:

"000000000000000","","","","email@yahoo.com","random_value""
"000000000000000","","","","email2@yahoo.com","random_value\"

2 个答案:

答案 0 :(得分:0)

描述

假设我们清理你的源文本以包含正确的结束引号,那么这个表达式将是:

  • 匹配所有引号逗号分隔文字
  • 捕获前导逗号,引号和结束引号,以及包含的文本到第0组
  • 修剪前导和右引号并将该值放入捕获组1
  • 允许值包含转义引号序列,例如\"""

(?:^|,)"((?<=")(?:[^"]*|\\"|"")*?)"(?=[,\r\n]|\Z)

enter image description here

实施例

现场演示:http://www.rubular.com/r/NSSxdHWcDM

示例文字

"1000000000000000","","","","email1@yahoo.com","1random_value"""
"2000000000000000","","","","email2@yahoo.com","2random_value\""

捕获论坛

[0][0] = "1000000000000000"
[0][1] = 1000000000000000

[1][0] = ,""
[1][1] = 

[2][0] = ,""
[2][1] = 

[3][0] = ,""
[3][1] = 

[4][0] = ,"email1@yahoo.com"
[4][1] = email1@yahoo.com

[5][0] = ,"1random_value"""
[5][1] = 1random_value""

[6][0] = "2000000000000000"
[6][1] = 2000000000000000

[7][0] = ,""
[7][1] = 

[8][0] = ,""
[8][1] = 

[9][0] = ,""
[9][1] = 

[10][0] = ,"email2@yahoo.com"
[10][1] = email2@yahoo.com

[11][0] = ,"2random_value\""
[11][1] = 2random_value\"

答案 1 :(得分:0)

Using JavaCSV

String str = "\"000000000000000\",\"\",\"\",\"\",\"email2@yahoo.com\",\"random_value\\\"\"";
CsvReader reader = CsvReader.parse(str);
reader.readRecord();
for (int i=0; i<reader.getColumnCount(); i++)
    System.out.printf("Scol[%d]: [%s]%n", i, reader.get(i));

<强>输出:

Scol[0]: [000000000000000]
Scol[1]: []
Scol[2]: []
Scol[3]: []
Scol[4]: [email2@yahoo.com]
Scol[5]: [random_value\"]