解析多行固定宽度文件

时间:2010-09-15 19:51:52

标签: java flat-file fixed-width

我有一个固定宽度的平面文件。更糟糕的是,每一行都可以是上面一行的新记录或子记录,由每行的第一个字符标识:

A0020SOME DESCRIPTION   MORE DESCRIPTION 922 2321      # Separate
A0021ANOTHER DESCRIPTIONMORE DESCRIPTION 23111442      # records
B0021ANOTHER DESCRIPTION   THIS TIME IN ANOTHER FORMAT # sub-record of record "0021"

我尝试使用Flatworm,这似乎是解析固定宽度数据的绝佳库。不幸的是,它的文档说明:

"Repeating segments are supported only for delimited files"

(同上,“重复片段”)。

我宁愿不为此编写自定义解析器。是(1)是否可以在Flatworm中执行此操作或(2)是否有提供此类(多行,多子记录)功能的库?

3 个答案:

答案 0 :(得分:2)

你看过JRecordBind吗?

http://jrecordbind.org/

“JRecordBind支持分层固定长度文件:某些类型的记录是其他记录类型的'儿子'。”

答案 1 :(得分:0)

检查Preon。虽然Preon的目标是比特流压缩数据,但您可以扭转其手臂并将其用于您识别的文件格式。使用Preon的好处是它也会生成人类可读的文档。

答案 2 :(得分:0)

使用uniVocity-parsers,您不仅可以读取固定宽度输入,还可以读取主 - 细节行(其中一行包含子行)。

以下是一个例子:

//1st, use a RowProcessor for the "detail" rows.
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor();

//2nd, create MasterDetailProcessor to identify whether or not a row is the master row.
// the row placement argument indicates whether the master detail row occurs before or after a sequence of "detail" rows.
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.TOP, detailProcessor) {
    @Override
    protected boolean isMasterRecord(String[] row, ParsingContext context) {
        //Returns true if the parsed row is the master row.
        return row[0].startsWith("B");
    }
};

FixedWidthParserSettings parserSettings = new FixedWidthParserSettings(new FixedWidthFieldLengths(4, 5, 40, 40, 8));

// Set the RowProcessor to the masterRowProcessor.
parserSettings.setRowProcessor(masterRowProcessor);

FixedWidthParser parser = new FixedWidthParser(parserSettings);
parser.parse(new FileReader(yourFile));

// Here we get the MasterDetailRecord elements.
List<MasterDetailRecord> rows = masterRowProcessor.getRecords();
for(MasterDetailRecord masterRecord = rows){
 // The master record has one master row and multiple detail rows.
    Object[] masterRow = masterRecord.getMasterRow();
    List<Object[]> detailRows = masterRecord.getDetailRows();
}

披露:我是这个图书馆的作者。它是开源和免费的(Apache V2.0许可证)。