我有32GB的csv和近1.5亿行,我计划使用SStableloader将数据导出到EC2上的cassandra,&生成SStable我使用下面的java代码。 问题是,在服务器上我只获得了12k行,生成的SStable的文件大小也只有28米。进程不会抛出任何错误。 此外,如果我在另一个.csv上执行它,一个有10行,没有问题,我得到所有10行。
if(args.length < 2){
System.out.println("Something wrong with parameters, heres pattern: <CSV_URL> <Default_Output_Dir>");
return;
}
CSV_URL = args[0];
DEFAULT_OUTPUT_DIR = args[1];
// magic!
Config.setClientMode(true);
// Create output directory that has keyspace and table name in the path
File outputDir = new File(DEFAULT_OUTPUT_DIR + File.separator + KEYSPACE + File.separator + TABLE);
if (!outputDir.exists() && !outputDir.mkdirs())
{
throw new RuntimeException("Cannot create output directory: " + outputDir);
}
// Prepare SSTable writer
CQLSSTableWriter.Builder builder = CQLSSTableWriter.builder();
// set output directory
builder.inDirectory(outputDir)
// set target schema
.forTable(SCHEMA)
// set CQL statement to put data
.using(INSERT_STMT)
// set partitioner if needed
// default is Murmur3Partitioner so set if you use different one.
.withPartitioner(new Murmur3Partitioner());
CQLSSTableWriter writer = builder.build();
try (
BufferedReader reader = new BufferedReader(new FileReader(CSV_URL));
CsvListReader csvReader = new CsvListReader(reader, CsvPreference.STANDARD_PREFERENCE)
){
//csvReader.getHeader(true);
// Write to SSTable while reading data
List<String> line;
while ((line = csvReader.read()) != null)
{
writer.addRow(
Integer.parseInt(line.get(0)),
..
new BigDecimal(line.get(22)),
new BigDecimal(line.get(23))
);
}
}
catch (Exception e)
{
e.printStackTrace();
}
try
{
writer.close();
}
catch (IOException ignore) {}
和这里的架构:
CREATE KEYSPACE IF NOT EXISTS ma WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE ma;
CREATE TABLE IF NOT EXISTS cassie (PKWID int,DX varchar,......, QS decimal,PRIMARY KEY (PKWID));
使用Cassandra 22x。 用于创建SSTable的Java驱动程序