Question

我有一个固定宽度的平面文件。更糟糕的是，每一行都可以是上面一行的新记录或子记录，由每行的第一个字符标识：

A0020SOME DESCRIPTION   MORE DESCRIPTION 922 2321      # Separate
A0021ANOTHER DESCRIPTIONMORE DESCRIPTION 23111442      # records
B0021ANOTHER DESCRIPTION   THIS TIME IN ANOTHER FORMAT # sub-record of record "0021"

我尝试使用Flatworm，这似乎是解析固定宽度数据的绝佳库。不幸的是，它的文档说明：

"Repeating segments are supported only for delimited files"

（同上，“重复片段”）。

我宁愿不为此编写自定义解析器。是（1）是否可以在Flatworm中执行此操作或（2）是否有提供此类（多行，多子记录）功能的库？

Answer 1

你看过JRecordBind吗？

http://jrecordbind.org/

“JRecordBind支持分层固定长度文件：某些类型的记录是其他记录类型的'儿子'。”

Answer 2

检查Preon。虽然Preon的目标是比特流压缩数据，但您可以扭转其手臂并将其用于您识别的文件格式。使用Preon的好处是它也会生成人类可读的文档。

Answer 3

使用uniVocity-parsers，您不仅可以读取固定宽度输入，还可以读取主 - 细节行（其中一行包含子行）。

以下是一个例子：

//1st, use a RowProcessor for the "detail" rows.
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor();

//2nd, create MasterDetailProcessor to identify whether or not a row is the master row.
// the row placement argument indicates whether the master detail row occurs before or after a sequence of "detail" rows.
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.TOP, detailProcessor) {
    @Override
    protected boolean isMasterRecord(String[] row, ParsingContext context) {
        //Returns true if the parsed row is the master row.
        return row[0].startsWith("B");
    }
};

FixedWidthParserSettings parserSettings = new FixedWidthParserSettings(new FixedWidthFieldLengths(4, 5, 40, 40, 8));

// Set the RowProcessor to the masterRowProcessor.
parserSettings.setRowProcessor(masterRowProcessor);

FixedWidthParser parser = new FixedWidthParser(parserSettings);
parser.parse(new FileReader(yourFile));

// Here we get the MasterDetailRecord elements.
List<MasterDetailRecord> rows = masterRowProcessor.getRecords();
for(MasterDetailRecord masterRecord = rows){
 // The master record has one master row and multiple detail rows.
    Object[] masterRow = masterRecord.getMasterRow();
    List<Object[]> detailRows = masterRecord.getDetailRows();
}

披露：我是这个图书馆的作者。它是开源和免费的（Apache V2.0许可证）。

解析多行固定宽度文件

3 个答案: