我正在研究Java数据框和可视化库Tablesaw
// https://mvnrepository.com/artifact/tech.tablesaw/tablesaw-core
implementation group: 'tech.tablesaw', name: 'tablesaw-core', version: '0.33.5'
我将始终有两个输入文件old
和new
组合在一起创建我的publish
输出数据集
我必须遵循的数据处理规则是
publish
数据集必须包含
1). all rows from new
2). rows that only exist in old
我可以按如下方式加载旧表和新表:-
final ColumnType[] types = { ColumnType.SKIP, ColumnType.STRING, ColumnType.STRING, ColumnType.DOUBLE, ColumnType.DOUBLE, ColumnType.STRING, ColumnType.STRING,
ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING,
ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING,
ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING,
ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.DOUBLE, ColumnType.DOUBLE };
final Builder builder = CsvReadOptions.builder("data/original/old.txt").separator('\\').header(true).columnTypes(types);
final CsvReadOptions options = builder.build();
final Table old = Table.read().usingOptions(options);
final ColumnType[] types = { ColumnType.STRING, ColumnType.STRING, ColumnType.DOUBLE, ColumnType.DOUBLE, ColumnType.STRING, ColumnType.STRING,
ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING,
ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING,
ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING,
ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.STRING, ColumnType.DOUBLE, ColumnType.DOUBLE };
final Builder builder = CsvReadOptions.builder("data/current/new.txt").separator('\\').header(true).columnTypes(types);
final CsvReadOptions options = builder.build();
final Table new = Table.read().usingOptions(options);
我看不到如何识别old
文件中仅存在的行。
每个文件的第一列包含一个唯一的String键。
我唯一的选择是遍历所有old
文件行并测试键值是否在new
文件中吗?
还是有一个Tablesaw
函数可以产生我需要的行结果集?