我有一个固定宽度的平面文件。更糟糕的是,每一行都可以是上面一行的新记录或子记录,由每行的第一个字符标识:
A0020SOME DESCRIPTION MORE DESCRIPTION 922 2321 # Separate
A0021ANOTHER DESCRIPTIONMORE DESCRIPTION 23111442 # records
B0021ANOTHER DESCRIPTION THIS TIME IN ANOTHER FORMAT # sub-record of record "0021"
我尝试使用Flatworm,这似乎是解析固定宽度数据的绝佳库。不幸的是,它的文档说明:
"Repeating segments are supported only for delimited files"
(同上,“重复片段”)。
我宁愿不为此编写自定义解析器。是(1)是否可以在Flatworm中执行此操作或(2)是否有提供此类(多行,多子记录)功能的库?
答案 0 :(得分:2)
答案 1 :(得分:0)
检查Preon。虽然Preon的目标是比特流压缩数据,但您可以扭转其手臂并将其用于您识别的文件格式。使用Preon的好处是它也会生成人类可读的文档。
答案 2 :(得分:0)
使用uniVocity-parsers,您不仅可以读取固定宽度输入,还可以读取主 - 细节行(其中一行包含子行)。
以下是一个例子:
//1st, use a RowProcessor for the "detail" rows.
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor();
//2nd, create MasterDetailProcessor to identify whether or not a row is the master row.
// the row placement argument indicates whether the master detail row occurs before or after a sequence of "detail" rows.
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.TOP, detailProcessor) {
@Override
protected boolean isMasterRecord(String[] row, ParsingContext context) {
//Returns true if the parsed row is the master row.
return row[0].startsWith("B");
}
};
FixedWidthParserSettings parserSettings = new FixedWidthParserSettings(new FixedWidthFieldLengths(4, 5, 40, 40, 8));
// Set the RowProcessor to the masterRowProcessor.
parserSettings.setRowProcessor(masterRowProcessor);
FixedWidthParser parser = new FixedWidthParser(parserSettings);
parser.parse(new FileReader(yourFile));
// Here we get the MasterDetailRecord elements.
List<MasterDetailRecord> rows = masterRowProcessor.getRecords();
for(MasterDetailRecord masterRecord = rows){
// The master record has one master row and multiple detail rows.
Object[] masterRow = masterRecord.getMasterRow();
List<Object[]> detailRows = masterRecord.getDetailRows();
}
披露:我是这个图书馆的作者。它是开源和免费的(Apache V2.0许可证)。