Java& Cassandra - 2个CQLSSTableWriter实例

时间:2015-02-10 16:26:43

标签: java cassandra bulk bulk-load

我试图找到一种最有效的方法,将来自Java程序的大量数据加载到Cassandra的密钥空间内的多个表中。这是我的Keyspace / Table声明:

CREATE KEYSPACE IF NOT EXISTS articles  WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : '3'}

CREATE TABLE IF NOT EXISTS articles.bigrams (docid text, bigram text, primary key (docid, bigram));
CREATE TABLE IF NOT EXISTS articles.unigrams (docid text, unigram text, primary key (docid, unigram));

这是Java程序中给我提出问题的部分。我试图创建2个QSQLSSTableWriter实例并写入每个实例:

package cassandrabulktest.cassandra;

import java.io.IOException;
import java.util.ArrayList;
import org.apache.cassandra.exceptions.InvalidRequestException;
import org.apache.cassandra.io.sstable.CQLSSTableWriter;



public class UnigramLoader {
    private static final String UNIGRAM_SCHEMA = "CREATE TABLE articles.unigrams (" +
                                                      "docid text, " +
                                                      "unigram text, " +
                                                      "PRIMARY KEY (unigram, docid))";

    private static CQLSSTableWriter unigram_writer = CQLSSTableWriter.builder()
                .inDirectory("/tables/articles/unigrams")
                .forTable(UNIGRAM_SCHEMA)
                .using("INSERT INTO articles.unigrams (docid, unigram) VALUES (?, ?)")
                .build();

    private static final String BIGRAM_SCHEMA = "CREATE TABLE articles.bigrams (" +
                                                      "docid text, " +
                                                      "bigram text, " +
                                                      "PRIMARY KEY (bigram, docid))";

    private static CQLSSTableWriter bigram_writer = CQLSSTableWriter.builder()
                .inDirectory("/tables/articles/bigrams")
                .forTable(BIGRAM_SCHEMA)
                .using("INSERT INTO articles.bigrams (docid, bigram) VALUES (?, ?)")
                .build();


    public static void load(String articleId, ArrayList<String> unigrams, ArrayList<String> bigrams) throws IOException, InvalidRequestException {        
        for (String unigram : unigrams) {
            unigram_writer.addRow(unigram, articleId);
        }

        for (String bigram : bigrams) {
            bigram_writer.addRow(bigram, articleId);
        }
    }

    public static void closeWriter() throws IOException {
        unigram_writer.close();
        bigram_writer.close();
    }
}

如果有效,这将开始在2个目录中创建SSTable文件。但是,我在运行时遇到此错误:

Exception in thread "Thread-1" java.lang.ExceptionInInitializerError
    at edu.georgetown.cassandrabulktest.runnables.UnigramRunnable.run(UnigramRunnable.java:69)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 662e2edf-c864-34a4-bca6-f83b25af6f6a; expected 7247b490-b141-11e4-a8f9-8b65543eda40)
    at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1125)
    at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:337)
    at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.forTable(CQLSSTableWriter.java:360)
    at edu.georgetown.cassandrabulktest.cassandra.UnigramLoader.<clinit>(UnigramLoader.java:29)
    ... 2 more
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 662e2edf-c864-34a4-bca6-f83b25af6f6a; expected 7247b490-b141-11e4-a8f9-8b65543eda40)
    at org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1208)
    at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1140)
    at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1121)
    ... 5 more

有没有办法做到这一点,还是有不同的方法来完成我想做的事情?提前谢谢!

1 个答案:

答案 0 :(得分:0)

您可能想尝试构建和使用单个编写器实例,因为在同时使用多个编写器时似乎存在一些竞争条件。