CQLSSTableWriter没有完全将.csv导出到SSTable - Cassandra

时间:2015-09-22 03:50:26

标签: java csv cassandra datastax datastax-java-driver

我有32GB的csv和近1.5亿行,我计划使用SStableloader将数据导出到EC2上的cassandra,&生成SStable我使用下面的java代码。 问题是,在服务器上我只获得了12k行,生成的SStable的文件大小也只有28米。进程不会抛出任何错误。 此外,如果我在另一个.csv上执行它,一个有10行,没有问题,我得到所有10行。

if(args.length < 2){
        System.out.println("Something wrong with parameters, heres pattern: <CSV_URL> <Default_Output_Dir>");
        return;
}

CSV_URL = args[0];
DEFAULT_OUTPUT_DIR = args[1];

// magic!
Config.setClientMode(true);

// Create output directory that has keyspace and table name in the path
File outputDir = new File(DEFAULT_OUTPUT_DIR + File.separator + KEYSPACE + File.separator + TABLE);
if (!outputDir.exists() && !outputDir.mkdirs())
{
    throw new RuntimeException("Cannot create output directory: " + outputDir);
}

// Prepare SSTable writer
CQLSSTableWriter.Builder builder = CQLSSTableWriter.builder();
// set output directory
builder.inDirectory(outputDir)
       // set target schema
       .forTable(SCHEMA)
       // set CQL statement to put data
       .using(INSERT_STMT)
       // set partitioner if needed
       // default is Murmur3Partitioner so set if you use different one.
       .withPartitioner(new Murmur3Partitioner());
CQLSSTableWriter writer = builder.build();

try (
    BufferedReader reader = new BufferedReader(new FileReader(CSV_URL));
    CsvListReader csvReader = new CsvListReader(reader, CsvPreference.STANDARD_PREFERENCE)
){
    //csvReader.getHeader(true);

    // Write to SSTable while reading data
    List<String> line;
    while ((line = csvReader.read()) != null)
    {
        writer.addRow(
            Integer.parseInt(line.get(0)),
            ..
            new BigDecimal(line.get(22)),
            new BigDecimal(line.get(23))
        );
    }
}
catch (Exception e)
{
    e.printStackTrace();
}


try
{
    writer.close();
}
catch (IOException ignore) {}

和这里的架构:

CREATE KEYSPACE IF NOT EXISTS ma WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE ma;
CREATE TABLE IF NOT EXISTS cassie (PKWID int,DX varchar,......, QS decimal,PRIMARY KEY (PKWID));

使用Cassandra 22x。 用于创建SSTable的Java驱动程序

0 个答案:

没有答案