我有一个非标准化内容的CSV文件,它是这样的:
John, 001
01/01/2015, hamburger
02/01/2015, pizza
03/01/2015, ice cream
Mary, 002
01/01/2015, hamburger
02/01/2015, pizza
John, 003
04/01/2015, chocolate
现在,我尝试做的是在java中编写一个逻辑来分隔它们。我想" John,001"作为标题并将所有行放在约翰之下,在玛丽成为约翰之前。
这可能吗?或者我应该手动完成吗?
编辑:
对于输入,即使它没有标准化,一个明显的模式是没有名称的行总是以日期开头。
我的输出目标是一个java对象,我可以在下面的格式中将它存储在数据库中。
Name, hamburger, pizza, ice cream, chocolate
John, 01/01/2015, 02/01/2015, 03/01/2015, NA
Mary, 01/01/2015, 02/01/2015, NA, NA
John, NA, NA, NA, 04/01/2015
答案 0 :(得分:2)
您可以将文件读入列表
List<String> lines = Files.readAllLines(Paths.get(path), StandardCharsets.UTF_8);
然后遍历列表并将其拆分为想要的分隔符(",")
。
现在您可以使用if-else或switch块来检查特定条目。
List<DataObject> objects = new ArrayList<>();
DataObject dataObject = null;
for(String s : lines) {
String [] splitLine = s.split(",");
if(splitLine[0].matches("(\d{2}\/){2}\d{4}")) {
// We found a data
if(dataObject != null && splitLine.length == 2) {
String date = splitLine[0];
String dish = splitLine[1];
dataObject.add(date, dish);
} else {
// Handle error
}
} else if(splitLine.length == 2) {
// We can create a new data object
if(dataObject != null) {
objects.add(dataObject);
}
String name = splitLine[0];
String id = splitLine[1];
dataObject = new DataObject(name, id);
} else {
// Handle error
}
}
现在您可以将它们分类到您的特定类别。
编辑:更改了循环并添加了一个正则表达式(可能不是最佳的)来匹配日期字符串并使用它们来决定是否将它们添加到最后一个数据对象。
DataObject类可以包含保存日期/菜肴的数据结构。解析CSV后,您可以遍历对象List并执行任何操作。我希望这个答案有所帮助:)
答案 1 :(得分:2)
如果我已经正确理解,那么规格是:
该算法采用伪代码:
Data structures :
one list of struct< string name, hash< int meal index, date> > for the names : base
one list of strings for the meals : meals
Code :
name = null
iname = -1
Loop per input lines {
if first field is date {
if name == null {
throw Exception("incorrect structure");
}
meal = second field
look for index of meal in meals
if not found {
index = len(meals);
add meal at end of list meals
}
base[iname].hash[index] = date
}
else {
name = first field
iname += 1
add a new struc { name, empty hash } at end of list base
}
}
close input file
open output file
// headers
print "names"
for meal in meals {
print ",", meal
}
print newline
for (i=0; i<=iname; i++) {
print base[i].name
for meal in meals {
look for meal in base[i].hash.keys
if found {
print ",", base[i].hash[meal]
}
else {
print ",NA"
}
}
print newline
}
close output file
只需用正确的Java编写代码,如果您有任何问题,请回到此处。
答案 2 :(得分:0)
使用uniVocity-parsers为您处理此问题。它配备了一个主 - 细节行处理器。
// 1st, Create a RowProcessor to process all "detail" elements (dates/ingredients)
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor();
// 2nd, Create MasterDetailProcessor to identify whether or not a row is the master row (first value of the row is a name, second is an integer).
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.TOP, detailProcessor) {
@Override
protected boolean isMasterRecord(String[] row, ParsingContext context) {
try{
//tries to convert the second value of the row to an Integer.
Integer.parseInt(String.valueOf(row[1]));
return true;
} catch(NumberFormatException ex){
return false;
}
}
};
CsvParserSettings parserSettings = new CsvParserSettings();
// Set the RowProcessor to the masterRowProcessor.
parserSettings.setRowProcessor(masterRowProcessor);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader(yourFile));
// Here we get the MasterDetailRecord elements.
List<MasterDetailRecord> rows = masterRowProcessor.getRecords();
// Each master record has one master row and multiple detail rows.
MasterDetailRecord masterRecord = rows.get(0);
Object[] masterRow = masterRecord.getMasterRow();
List<Object[]> detailRows = masterRecord.getDetailRows();
披露:我是这个图书馆的作者。它是开源和免费的(Apache V2.0许可证)。