CSVReader - 使用&#34时的错误逃脱char

时间:2016-07-12 10:18:28

标签: java opencsv

我正在使用OpenCSV。

我有CSVReader试图解析CSV文件 该文件具有引号char "和分隔符char ,以及转义字符"

请注意,CSV包含以下单元格:

"ballet 24"" classes"
"\"  

实际上代表了这些值:

ballet 24" classes
\

示例:

"9/6/2014","3170168","123652278","Computer","2329043290","Bing and Yahoo! search","22951990789","voice lesson","Broad","0.00","0","1","3.00","0.00","0.00","0.00","7","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990795","ballet class","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet 24"" classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Computer","2329043291","Bing and Yahoo! search","22951990817","\","Broad","0.00","0","1","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","1","7.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","4","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990874","zumba lessons","Broad","0.00","0","1","2.00","0.00","0.00","0.00","0","0","",""

我的问题是我无法为"构造函数指定CSVReader转义字符 (即使其与引用字母相同) 如果我这样做,CSVReader就会变得疯狂,它将整个CSV行作为单个CSV单元格读取。

是否还有其他人遇到此错误以及如何解决此问题?!

3 个答案:

答案 0 :(得分:3)

如果使用CsvReader的默认设置,它将起作用。

检查他们拥有的这个开放式错误:sourceforge.net/p/opencsv/bugs/83

  

实际上,它运作正常,而不是你的想法。它的默认值是   分隔符的逗号,引号字符的引号和反斜杠   逃避角色。但是,它理解两个连续的报价   字符作为转义引号字符。所以,如果你只是去   默认情况下,它会正常工作。

默认情况下,它可以使用双引号转义双引号,但您的' true'转义字符必须仍然是别的。

以下是有效的:

CSVReader reader = new CSVReader(new FileReader(App.class.getClassLoader().getResource("csv.csv").getFile()), ',','"','-');
  • 逗号作为分隔符
  • 双引号为引用字符
  • 破折号(任何其他角色)作为转义字符

起初我把' \'作为逃避角色,然后,你的领域" \"需要修改以逃避转义字符。

答案 1 :(得分:1)

CSVReader不完全符合RFC4180。使用其较新的CSV阅读器(RFC4180Parser):

RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(
    new FileReader("input.csv"));

CSVReader reader = csvReaderBuilder
    .withCSVParser(rfc4180Parser)
    .build();

要读取格式为CSV的字符串行:

String test = "ballet 24\"\" classes";
String[] columns = new RFC4180Parser().parseLine(test);

要使用阅读器(替代方法是reader.readNext()

for (String[] line : reader.readAll()) {
  for (String s : line) {
    System.out.println(s);
  }
}

有关更多详细信息,请参见http://opencsv.sourceforge.net/#rfc4180parser

代码取自GeekPrompt

答案 2 :(得分:0)

无法通过CSVReader完成

from pyspark.sql.session import SparkSession

spark = SparkSession(sc)
rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= "\"")