public static List<List<String>> parseCSV(String contents,Boolean skipHeaders) {
List<List<String>> allFields = new List<List<String>>();
// replace instances where a double quote begins a field containing a comma
// in this case you get a double quote followed by a doubled double quote
// do this for beginning and end of a field
contents = contents.replaceAll(',"""',',"DBLQT').replaceall('""",','DBLQT",');
// now replace all remaining double quotes - we do this so that we can reconstruct
// fields with commas inside assuming they begin and end with a double quote
contents = contents.replaceAll('""','DBLQT');
// we are not attempting to handle fields with a newline inside of them
// so, split on newline to get the spreadsheet rows
List<String> lines = new List<String>();
try {
lines = contents.split('\n');
} catch (System.ListException e) {
System.debug('Limits exceeded?' + e.getMessage());
}
Integer num = 0;
for(String line : lines) {
// check for blank CSV lines (only commas)
if (line.replaceAll(',','').trim().length() == 0) break;
List<String> fields = line.split(',');
List<String> cleanFields = new List<String>();
String compositeField;
Boolean makeCompositeField = false;
for(String field : fields) {
if (field.startsWith('"') && field.endsWith('"')) {
cleanFields.add(field.replaceAll('DBLQT','"'));
} else if (field.startsWith('"')) {
makeCompositeField = true;
compositeField = field;
} else if (field.endsWith('"')) {
compositeField += ',' + field;
cleanFields.add(compositeField.replaceAll('DBLQT','"'));
makeCompositeField = false;
} else if (makeCompositeField) {
compositeField += ',' + field;
} else {
cleanFields.add(field.replaceAll('DBLQT','"'));
}
}
allFields.add(cleanFields);
}
if(skipHeaders)allFields.remove(0);
return allFields;
}
我使用这部分来解析CSV文件,但是当CSV全部用双引号括起来时,我发现我无法解析。
例如,我有这样的记录 “一”, “B”, “C”, “d,E,F”, “G”
解析后,我想得到这些 a b c d,e,f g
答案 0 :(得分:0)
从我所看到的情况来看,你要做的第一件事是用逗号分割你从CSV文件中获得的行,使用这一行:
列表&lt;字符串&gt; fields = line.split(&#39;,&#39;);
当你对自己的例子这样做时(&#34; a&#34;,&#34; b&#34;,&#34; c&#34;,&#34; d,e,f&#34 ;,&#34; g&#34;),你得到的字符串列表是:
fields =(&#34; a&#34; |&#34; b&#34; |&#34; c&#34; | &#34; d | e | f&#34; |&#34; g&#34;),其中栏用于分隔列表元素
这里的问题是,如果你首先用逗号分割,那么区分那些属于字段的逗号(因为它们实际上出现在引号内)将会更加难以区分那些用你的字段分隔的字母
我建议尝试用引号分割这行,这会给你这样的东西:
fields =(a |,| b |,| c |,| d,e,f |,| g)
并过滤掉列表中仅包含逗号和/或空格的任何元素,最终实现此目的:
fields =(a | b | c | d,e,f | g)
(适用编辑)强>
您使用的是Java吗? 无论如何,这是一个Java代码,可以执行您尝试执行的操作:
import java.lang.*;
import java.util.*;
public class HelloWorld
{
public static ArrayList<ArrayList<String>> parseCSV(String contents,Boolean skipHeaders) {
ArrayList<ArrayList<String>> allFields = new ArrayList<ArrayList<String>>();
// separating the file in lines
List<String> lines = new ArrayList<String>();
lines = Arrays.asList(contents.split("\n"));
// ignoring header, if needed
if(skipHeaders) lines.remove(0);
// for each line
for(String line : lines) {
List<String> fields = Arrays.asList(line.split("\""));
ArrayList<String> cleanFields = new ArrayList<String>();
Boolean isComma = false;
for(String field : fields) {
// ignore elements that don't have useful data
// (every other element after splitting by quotes)
isComma = !isComma;
if (isComma) continue;
cleanFields.add(field);
}
allFields.add(cleanFields);
}
return allFields;
}
public static void main(String[] args)
{
// example of input file:
// Line 1: "a","b","c","d,e,f","g"
// Line 2: "a1","b1","c1","d1,e1,f1","g1"
ArrayList<ArrayList<String>> strings = HelloWorld.parseCSV("\"a\",\"b\",\"c\",\"d,e,f\",\"g\"\n\"a1\",\"b1\",\"c1\",\"d1,e1,f1\",\"g1\"",false);
System.out.println("Result:");
for (ArrayList<String> list : strings) {
System.out.println(" New List:");
for (String str : list) {
System.out.println(" - " + str);
}
}
}
}