Univocity - 不规则的csv解析

时间:2017-07-06 14:44:58

标签: csv univocity

我需要解析不规则(虽然一致)“csv”文件。内容如下:

String field1;
String field2;
String field5;
List<Car> cars;

理想情况下,我想使用与here类似的方法。

我最终想要得到一个像这样的对象:

{
  "children": [
    {
      "name": "apple",
      "path": "hutchison",
      "state": 16,
      "url": "https://go.gl/",
      "id": "ZXNrb0luZGlh",
      "properties": [
        {
          "id": "type",
          "name": "type",
          "propertyType": "string",
          "value": "COMPANY"
        },
        {
          "id": "folderify",
          "name": "folderify",
          "propertyType": "boolean",
          "value": "true"
        },
        {
          "id": "name",
          "name": "name",
          "propertyType": "string",
          "value": "yum"
        }
      ],
      "content": [],
      "createdBy": "00ui3tqahuu6bwMhU0i6",
      "creationDate": 1497348318103,
      "modificationDate": 1499082955407,
      "modifiedBy": "00ui3tqahuu6bwMhU0i6"
    },
    {
      "name": "test",
      "path": "test",
      "state": 13,
      "url": "https://ODE/v0/test",
      "id": "YW5ib3Rlc3Q=",
      "properties": [
        {
          "id": "string",
          "name": "string",
          "propertyType": "string",
          "value": "string"
        }
      ],
      "content": [],
      "createdBy": "00u33355JQXgmzqO90i5",
      "creationDate": 1498463285568,
      "modificationDate": 1498463356176,
      "modifiedBy": "00u33355JQXgmzqO90i5"
    },
    {
      "name": "KE",
      "path": "KE",
      "state": 4,
      "url": "https://full.com/NODE/v0/DEKE",
      "id": "REVLRQ==",
      "properties": [
        {
          "id": "type",
          "name": "type",
          "propertyType": "string",
          "value": "COMPANY"
        },
        {
          "id": "folderify",
          "name": "folderify",
          "propertyType": "boolean",
          "value": "true"
        },
        {
          "id": "name",
          "name": "name",
          "propertyType": "string",
          "value": "KE"
        }
      ],
      "content": [],
      "createdBy": "00uy9bswhaVnUggF00i6",
      "creationDate": 1498805345347,
      "modificationDate": 1498805346371,
      "modifiedBy": "00uy9bswhaVnUggF00i6"
    }
]
,
  "name": "",
  "path": "",
  "state": 27,
  "url": "https://go.gl/ODE/v0",
  "id": "ROOT",
  "properties": [],
  "content": [],
  "createdBy": "INITIAL",
  "creationDate": 1497261853581,
  "modificationDate": 1498805345347,
  "modifiedBy": "00uy9bswhaVnUggF00i6"
}

我目前遇到以下问题:

  • 添加一些探索性测试后,将忽略以hash(#)开头的行。我不想要这个,无论如何要逃避?
  • 我的目的是为汽车部分使用BeanListProcessor,并使用单独的行处理器处理其他字段。然后将结果组合在上面提到的对象中。我在这里错过任何技巧吗?

1 个答案:

答案 0 :(得分:1)

您的第一个问题是使用#,默认情况下会将其视为注释字符。要防止以#开头的行被视为注释,请执行以下操作:

parserSettings.getFormat().setComment('\0');

至于你正在解析的结构,没有办法开箱即用,但很容易利用它的API。以下内容适用:

    CsvParserSettings settings = new CsvParserSettings();
    settings.getFormat().setComment('\0'); //prevent lines starting with # to be parsed as comments

    //Creates a parser
    CsvParser parser = new CsvParser(settings);

    //Open the input
    parser.beginParsing(new File("/path/to/input.csv"), "UTF-8");

    //create BeanListProcessor for instances of Car, and initialize it.
    BeanListProcessor<Car> carProcessor = new BeanListProcessor<Car>(Car.class);
    carProcessor.processStarted(parser.getContext());

    String[] row;
    Parent parent = null;
    while ((row = parser.parseNext()) != null) { //read rows one by one.
        if (row[0].startsWith("Field1:")) {  // when Field1 is found, create your parent instance
            if (parent != null) { //if you already have a parent instance, cars have been read. Associate the list of cars to the instance
                parent.cars = new ArrayList<Car>(carProcessor.getBeans()); //copy the list of cars from the processor.
                carProcessor.getBeans().clear(); //clears the processor list
                //you probably want to do something with your parent bean here.
            }
            parent = new Parent(); //create a fresh parent instance
            parent.field1 = row[0]; //assign the fields as appropriate.
        } else if (row[0].startsWith("Field2:")) {
            parent.field2 = row[0]; //and so on
        } else if (row[0].startsWith("Field5:")) {
            parent.field5 = row[0];
        } else if (row[0].startsWith("#")){ //got a "Car" row, invoke the rowProcessed method of the carProcessor.
            carProcessor.rowProcessed(row, parser.getContext());
        }
    }

    //at the end, if there is a parent, get the cars parsed
    if (parent != null) {
        parent.cars = carProcessor.getBeans();
    }

要使BeanListProcessor生效,您需要将您的实例声明为:

public static final class Car {
    @Parsed(index = 0)
    String id;
    @Parsed(index = 1)
    String col1;
    @Parsed(index = 2)
    String col2;
    @Parsed(index = 3)
    String col3;
    @Parsed(index = 4)
    String col4;
    @Parsed(index = 5)
    String col5;
    @Parsed(index = 6)
    String col6;
}

您可以使用标题,但它会让您编写更多代码。如果标题始终相同,那么您可以假定位置是固定的。

希望这有帮助