Question

所以我正在解析.csv文件。我在StackOverflow上的某个地方接受了另一个线程的建议并下载了SuperCSV。我终于完成了一切工作，但现在我遇到了一个似乎难以修复的错误。

出现此问题是因为最后两列数据可能已填充，也可能未填充。下面是.csv文件的示例，第一行缺少最后一列，第二行完全完成：

2012：07：25,11：48：20922 “uLog.exe”， “”，钥匙按下，1246,341，-1.00，-1.00,1.00，移 2012：07：25,11：48：21,094 “uLog.exe”， “”，重点按压，1246,341，-1.00，-1.00,1.00，B，移

根据我对Super CSV Javadoc的理解，如果存在可变数量的列，则无法使用CsvBeanReader填充Java Bean。这似乎真的很愚蠢，因为我觉得在初始化Bean时，应该允许这些缺少的列为null或其他一些默认值。

供参考，这是解析器的完整代码：

public class ULogParser {

String uLogFileLocation;
String screenRecorderFileLocation;

private static final CellProcessor[] cellProcessor = new CellProcessor[] {
    new ParseDate("yyyy:MM:dd"),
    new ParseDate("HH:mm:ss"),
    new ParseDate("SSS"),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
    new ParseInt(),
    new ParseInt(),
    new ParseDouble(),
    new ParseDouble(),
    new ParseDouble(),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
};

public String[] header = {"Date", "Time", "Msec", "Application", "Window", "Message", "X", "Y", "RelDist", "TotalDist", "Rate", "Extra1", "Extra2"}; 

public ULogParser(String uLogFileLocation, String screenRecorderFileLocation)
{
    this.uLogFileLocation = uLogFileLocation;
    this.screenRecorderFileLocation = screenRecorderFileLocation;
}

public void parse()
{
    try {
        ICsvBeanReader reader = new CsvBeanReader(new BufferedReader(new FileReader(uLogFileLocation)), CsvPreference.STANDARD_PREFERENCE);
        reader.getCSVHeader(false); //parse past the header
        Entry entry;
        entry = reader.read(Entry.class, header, cellProcessor);
        System.out.println(entry.Application);
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

public void sendToDB()
{
    Query query = new Query();
}
}

Entry类的代码：

public class Entry
{
private Date Date;
private Date Time;
private Date Msec;
private String Application;
private String Window;
private String Message;
private int X;
private int Y;
private double RelDist;
private double TotalDist;
private double Rate;
private String Extra1;
private String Extra2;

public Date getDate() { return Date; }
public Date getTime() { return Time; }
public Date getMsec() { return Msec; }
public String getApplication() { return Application; }
public String getWindow() { return Window; }
public String getMessage() { return Message; }
public int getX() { return X; }
public int getY() { return Y; }
public double getRelDist() { return RelDist; }
public double getTotalDist() { return TotalDist; }
public double getRate() { return Rate; }
public String getExtra1() { return Extra1; }
public String getExtra2() { return Extra2; }

public void setDate(Date Date) { this.Date = Date; }
public void setTime(Date Time) { this.Time = Time; }
public void setMsec(Date Msec) { this.Msec = Msec; }
public void setApplication(String Application) { this.Application = Application; }
public void setWindow(String Window) { this.Window = Window; }
public void setMessage(String Message) { this.Message = Message; }
public void setX(int X) { this.X = X; }
public void setY(int Y) { this.Y = Y; }
public void setRelDist(double RelDist) { this.RelDist = RelDist; }
public void setTotalDist(double TotalDist) { this.TotalDist = TotalDist; }
public void setRate(double Rate) { this.Rate = Rate; }
public void setExtra1(String Extra1) { this.Extra1 = Extra1; }
public void setExtra2(String Extra2) { this.Extra2 = Extra2; }

public Entry(){}
}

我收到的异常（请注意，这与上面的示例不同，缺少最后两列）：

Exception in thread "main" The value array (size 12)  must match the processors array (size 13): You are probably reading a CSV line with a different number of columns than the number of cellprocessors specified context: Line: 2 Column: 0 Raw line:
[2012:07:25, 11:48:05, 740, uLog.exe,  , Logging started, -1, -1, -1.00, -1.00, -1.00, ]
 offending processor: null
    at org.supercsv.util.Util.processStringList(Unknown Source)
    at org.supercsv.io.CsvBeanReader.read(Unknown Source)
    at processing.ULogParser.parse(ULogParser.java:59)
    at ui.ParseImplicitData.main(ParseImplicitData.java:15)

是的，写下所有那些吸气剂和制定者是痛苦的屁股。另外，我道歉，在使用SuperCSV时可能没有完美的约定（如果你只想要未修改的字符串就像使用CellProcessor那样），但是你明白了。此外，此代码显然不完整。现在，我只是想成功检索一行数据。

此时，我想知道是否可以为我的目的使用CsvBeanReader。如果没有，我有点失望，因为CsvListReader（我会发布超链接，但StackOverflow也不允许我，也是哑巴）就像没有使用API一样容易，只是使用Scanner.next （）。

任何帮助将不胜感激。提前谢谢！

Answer 1

修改：[{3}}

的更新

请注意，Super CSV 2.0.0-beta-1中的API已更改（代码示例基于1.52）。所有读者的getCSVHeader()方法现在都是getHeader()（与作者writeHeader一致）。

此外，SuperCSVException已重命名为SuperCsvException。

编辑：Super CSV 2.1.0的更新

从版本2.1.0开始，可以在使用新的executeProcessors()方法读取一行CSV后执行单元处理器。有关更多信息，请参阅项目网站上的Super CSV 2.0.0-beta-1。请注意，这仅与CsvListReader相关，因为它是唯一允许列长度可变的读取器。

你是对的 - CsvBeanReader不支持列数可变的CSV文件。根据大多数CSV规范（包括this example），每行的列数必须相同。

出于这个原因（作为超级CSV开发人员）我不愿意将此功能添加到Super CSV。如果你能想出一种优雅的方式来添加它，那么可以随意在项目的SourceForge网站上提出建议。它可能意味着一个新的阅读器扩展到CsvBeanReader：它必须将读取和映射/处理分成两个单独的方法（除非你知道如何知道如何处理或映射到bean的字段有很多专栏。）

简单解决方案

对此的简单解决方案（如果您控制了正在使用的CSV文件）只需在编写CSV文件时添加空白列（示例中的第一行最后会有逗号 - 表示最后一列是空的）。这样，您的CSV文件就会有效（每行的列数相同），您可以使用CsvBeanReader。

如果那不可能，那么一切都不会丢失！

花式解决方案

您可能已经意识到，CsvBeanReader使用名称映射将CSV文件中的每个列与bean中的字段相关联，并使用CellProcessor数组来处理每个列。换句话说，如果要使用它，您必须知道有多少列（以及它们代表什么）。
另一方面，
CsvListReader非常原始，可以读取不同长度的行（因为它不需要处理或映射它们）。

因此，您可以将CsvBeanReader与CsvListReader的所有功能结合起来（如下例所示），同时阅读两个读取器的文件：使用CsvListReader来弄清楚如何有许多列，CsvBeanReader进行处理/映射。

请注意，这假设它只是可能不存在的birthDate列（即如果你不知道哪个列丢失，它将不起作用）。

package example; import java.io.StringReader; import java.util.Date; import org.supercsv.cellprocessor.ParseDate; import org.supercsv.cellprocessor.ift.CellProcessor; import org.supercsv.exception.SuperCSVException; import org.supercsv.io.CsvBeanReader; import org.supercsv.io.CsvListReader; import org.supercsv.io.ICsvBeanReader; import org.supercsv.io.ICsvListReader; import org.supercsv.prefs.CsvPreference; public class VariableColumns { private static final String INPUT = "name,birthDate,city\n" + "John,New York\n" + "Sally,22/03/1974,London\n" + "Jim,Sydney"; // cell processors private static final CellProcessor[] NORMAL_PROCESSORS = new CellProcessor[] {null, new ParseDate("dd/MM/yyyy"), null }; private static final CellProcessor[] NO_BIRTHDATE_PROCESSORS = new CellProcessor[] {null, null }; // name mappings private static final String[] NORMAL_HEADER = new String[] { "name", "birthDate", "city" }; private static final String[] NO_BIRTHDATE_HEADER = new String[] { "name", "city" }; public static void main(String[] args) { // using bean reader and list reader together (to read the same file) final ICsvBeanReader beanReader = new CsvBeanReader(new StringReader( INPUT), CsvPreference.STANDARD_PREFERENCE); final ICsvListReader listReader = new CsvListReader(new StringReader( INPUT), CsvPreference.STANDARD_PREFERENCE); try { // skip over header beanReader.getCSVHeader(true); listReader.getCSVHeader(true); while (listReader.read() != null) { final String[] nameMapping; final CellProcessor[] processors; if (listReader.length() == NORMAL_HEADER.length) { // all columns present - use normal header/processors nameMapping = NORMAL_HEADER; processors = NORMAL_PROCESSORS; } else if (listReader.length() == NO_BIRTHDATE_HEADER.length) { // one less column - birth date must be missing nameMapping = NO_BIRTHDATE_HEADER; processors = NO_BIRTHDATE_PROCESSORS; } else { throw new SuperCSVException( "unexpected number of columns: " + listReader.length()); } // can now use CsvBeanReader safely // (we know how many columns there are) Person person = beanReader.read(Person.class, nameMapping, processors); System.out.println(String.format( "Person: name=%s, birthDate=%s, city=%s", person.getName(), person.getBirthDate(), person.getCity())); } } catch (Exception e) { // handle exceptions here e.printStackTrace(); } finally { // close readers here } } public static class Person { private String name; private Date birthDate; private String city; public String getName() { return name; } public void setName(String name) { this.name = name; } public Date getBirthDate() { return birthDate; } public void setBirthDate(Date birthDate) { this.birthDate = birthDate; } public String getCity() { return city; } public void setCity(String city) { this.city = city; } } }

我希望这会有所帮助。

哦，是否有任何理由说明Entry类中的字段不遵循正常的命名约定（camelCase）？如果您更新header数组以使用camelcase，那么您的字段也可以是camelcase。

Answer 2

嗯，SuperCSV是开源的。如果要添加功能，例如使用可变数量的尾随字段处理输入，则基本上有两个选项：

在SourceForge网站上发布支持请求，希望作者同意并有时间这样做
下载源代码，根据自己的喜好进行更改，并将更改提交给项目。

这就是开源的工作原理。

Answer 3

使用uniVocity-parsers，您可以将具有不同列数的CSV文件映射到java bean。使用注释：

class TestBean {

// if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
@NullString(nulls = { "?", "-" })
// if a value resolves to null, it will be converted to the String "0".
@Parsed(defaultNullRead = "0")
private Integer quantity;   // The attribute type defines which conversion will be executed when processing the value.
// In this case, IntegerConversion will be used.
// The attribute name will be matched against the column header in the file automatically.

@Trim
@LowerCase
// the value for the comments attribute is in the column at index 4 (0 is the first column, so this means fifth column in the file)
@Parsed(index = 4)
private String comments;

// you can also explicitly give the name of a column in the file.
@Parsed(field = "amount")
private BigDecimal amount;

@Trim
@LowerCase
// values "no", "n" and "null" will be converted to false; values "yes" and "y" will be converted to true
@BooleanString(falseStrings = { "no", "n", "null" }, trueStrings = { "yes", "y" })
@Parsed
private Boolean pending;
...
}

将CSV解析为TestBean个实例列表：

// BeanListProcessor converts each parsed row to an instance of a given class, then stores each instance into a list.
BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
//Uses the first valid row of the CSV to assign names to each column
parserSettings.setHeaderExtractionEnabled(true);

CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader(yourFile));

// The BeanListProcessor provides a list of objects extracted from the input.
List<TestBean> beans = rowProcessor.getBeans();

披露：我是这个图书馆的作者。它是开源和免费的（Apache V2.0许可证）。

使用CsvBeanReader读取具有可变列数的CSV文件

3 个答案:

简单解决方案

花式解决方案