Question

public static List<List<String>> parseCSV(String contents,Boolean skipHeaders) {
List<List<String>> allFields = new List<List<String>>();

// replace instances where a double quote begins a field containing a comma
// in this case you get a double quote followed by a doubled double quote
// do this for beginning and end of a field
contents = contents.replaceAll(',"""',',"DBLQT').replaceall('""",','DBLQT",');
// now replace all remaining double quotes - we do this so that we can reconstruct
// fields with commas inside assuming they begin and end with a double quote
contents = contents.replaceAll('""','DBLQT');
// we are not attempting to handle fields with a newline inside of them
// so, split on newline to get the spreadsheet rows
List<String> lines = new List<String>();
try {
    lines = contents.split('\n');
} catch (System.ListException e) {
    System.debug('Limits exceeded?' + e.getMessage());
}
Integer num = 0;
for(String line : lines) {
    // check for blank CSV lines (only commas)
    if (line.replaceAll(',','').trim().length() == 0) break;

    List<String> fields = line.split(',');  
    List<String> cleanFields = new List<String>();
    String compositeField;
    Boolean makeCompositeField = false;
    for(String field : fields) {
        if (field.startsWith('"') && field.endsWith('"')) {
            cleanFields.add(field.replaceAll('DBLQT','"'));
        } else if (field.startsWith('"')) {
            makeCompositeField = true;
            compositeField = field;
        } else if (field.endsWith('"')) {
            compositeField += ',' + field;
            cleanFields.add(compositeField.replaceAll('DBLQT','"'));
            makeCompositeField = false;
        } else if (makeCompositeField) {
            compositeField +=  ',' + field;
        } else {
            cleanFields.add(field.replaceAll('DBLQT','"'));
        }
    }

    allFields.add(cleanFields);

}


if(skipHeaders)allFields.remove(0);

return allFields;       
}

我使用这部分来解析CSV文件，但是当CSV全部用双引号括起来时，我发现我无法解析。

例如，我有这样的记录 “一”， “B”， “C”， “d，E，F”， “G”

解析后，我想得到这些 a b c d，e，f g

Answer 1

从我所看到的情况来看，你要做的第一件事是用逗号分割你从CSV文件中获得的行，使用这一行：

列表＆lt;字符串＆gt; fields = line.split（＆＃39;，＆＃39;）;

当你对自己的例子这样做时（＆＃34; a＆＃34;，＆＃34; b＆＃34;，＆＃34; c＆＃34;，＆＃34; d，e，f＆＃34 ;，＆＃34; g＆＃34;），你得到的字符串列表是：

fields =（＆＃34; a＆＃34; |＆＃34; b＆＃34; |＆＃34; c＆＃34; | ＆＃34; d | e | f＆＃34; |＆＃34; g＆＃34;），其中栏用于分隔列表元素

这里的问题是，如果你首先用逗号分割，那么区分那些属于字段的逗号（因为它们实际上出现在引号内）将会更加难以区分那些用你的字段分隔的字母

我建议尝试用引号分割这行，这会给你这样的东西：

fields =（a |，| b |，| c |，| d，e，f |，| g）

并过滤掉列表中仅包含逗号和/或空格的任何元素，最终实现此目的：

fields =（a | b | c | d，e，f | g）

（适用编辑）

您使用的是Java吗？无论如何，这是一个Java代码，可以执行您尝试执行的操作：

import java.lang.*; import java.util.*; public class HelloWorld { public static ArrayList<ArrayList<String>> parseCSV(String contents,Boolean skipHeaders) { ArrayList<ArrayList<String>> allFields = new ArrayList<ArrayList<String>>(); // separating the file in lines List<String> lines = new ArrayList<String>(); lines = Arrays.asList(contents.split("\n")); // ignoring header, if needed if(skipHeaders) lines.remove(0); // for each line for(String line : lines) { List<String> fields = Arrays.asList(line.split("\"")); ArrayList<String> cleanFields = new ArrayList<String>(); Boolean isComma = false; for(String field : fields) { // ignore elements that don't have useful data // (every other element after splitting by quotes) isComma = !isComma; if (isComma) continue; cleanFields.add(field); } allFields.add(cleanFields); } return allFields; } public static void main(String[] args) { // example of input file: // Line 1: "a","b","c","d,e,f","g" // Line 2: "a1","b1","c1","d1,e1,f1","g1" ArrayList<ArrayList<String>> strings = HelloWorld.parseCSV("\"a\",\"b\",\"c\",\"d,e,f\",\"g\"\n\"a1\",\"b1\",\"c1\",\"d1,e1,f1\",\"g1\"",false); System.out.println("Result:"); for (ArrayList<String> list : strings) { System.out.println(" New List:"); for (String str : list) { System.out.println(" - " + str); } } } }

apex解析每个记录中包含双引号的csv

1 个答案: