我正在使用OpenCSV。
我有CSVReader
试图解析CSV文件
该文件具有引号char "
和分隔符char ,
以及转义字符"
。
请注意,CSV包含以下单元格:
"ballet 24"" classes"
"\"
实际上代表了这些值:
ballet 24" classes
\
示例:
"9/6/2014","3170168","123652278","Computer","2329043290","Bing and Yahoo! search","22951990789","voice lesson","Broad","0.00","0","1","3.00","0.00","0.00","0.00","7","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990795","ballet class","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet 24"" classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Computer","2329043291","Bing and Yahoo! search","22951990817","\","Broad","0.00","0","1","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","1","7.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","4","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990874","zumba lessons","Broad","0.00","0","1","2.00","0.00","0.00","0.00","0","0","",""
我的问题是我无法为"
构造函数指定CSVReader
转义字符
(即使其与引用字母相同)
如果我这样做,CSVReader
就会变得疯狂,它将整个CSV行作为单个CSV单元格读取。
是否还有其他人遇到此错误以及如何解决此问题?!
答案 0 :(得分:3)
如果使用CsvReader的默认设置,它将起作用。
检查他们拥有的这个开放式错误:sourceforge.net/p/opencsv/bugs/83:
实际上,它运作正常,而不是你的想法。它的默认值是 分隔符的逗号,引号字符的引号和反斜杠 逃避角色。但是,它理解两个连续的报价 字符作为转义引号字符。所以,如果你只是去 默认情况下,它会正常工作。
默认情况下,它可以使用双引号转义双引号,但您的' true'转义字符必须仍然是别的。
以下是有效的:
CSVReader reader = new CSVReader(new FileReader(App.class.getClassLoader().getResource("csv.csv").getFile()), ',','"','-');
起初我把' \'作为逃避角色,然后,你的领域" \"需要修改以逃避转义字符。
答案 1 :(得分:1)
CSVReader
不完全符合RFC4180。使用其较新的CSV阅读器(RFC4180Parser):
RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(
new FileReader("input.csv"));
CSVReader reader = csvReaderBuilder
.withCSVParser(rfc4180Parser)
.build();
要读取格式为CSV的字符串行:
String test = "ballet 24\"\" classes";
String[] columns = new RFC4180Parser().parseLine(test);
要使用阅读器(替代方法是reader.readNext()
)
for (String[] line : reader.readAll()) {
for (String s : line) {
System.out.println(s);
}
}
有关更多详细信息,请参见http://opencsv.sourceforge.net/#rfc4180parser。
代码取自GeekPrompt
答案 2 :(得分:0)
无法通过CSVReader完成
from pyspark.sql.session import SparkSession
spark = SparkSession(sc)
rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= "\"")