我试图找到一种最有效的方法,将来自Java程序的大量数据加载到Cassandra的密钥空间内的多个表中。这是我的Keyspace / Table声明:
CREATE KEYSPACE IF NOT EXISTS articles WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : '3'}
CREATE TABLE IF NOT EXISTS articles.bigrams (docid text, bigram text, primary key (docid, bigram));
CREATE TABLE IF NOT EXISTS articles.unigrams (docid text, unigram text, primary key (docid, unigram));
这是Java程序中给我提出问题的部分。我试图创建2个QSQLSSTableWriter实例并写入每个实例:
package cassandrabulktest.cassandra;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.cassandra.exceptions.InvalidRequestException;
import org.apache.cassandra.io.sstable.CQLSSTableWriter;
public class UnigramLoader {
private static final String UNIGRAM_SCHEMA = "CREATE TABLE articles.unigrams (" +
"docid text, " +
"unigram text, " +
"PRIMARY KEY (unigram, docid))";
private static CQLSSTableWriter unigram_writer = CQLSSTableWriter.builder()
.inDirectory("/tables/articles/unigrams")
.forTable(UNIGRAM_SCHEMA)
.using("INSERT INTO articles.unigrams (docid, unigram) VALUES (?, ?)")
.build();
private static final String BIGRAM_SCHEMA = "CREATE TABLE articles.bigrams (" +
"docid text, " +
"bigram text, " +
"PRIMARY KEY (bigram, docid))";
private static CQLSSTableWriter bigram_writer = CQLSSTableWriter.builder()
.inDirectory("/tables/articles/bigrams")
.forTable(BIGRAM_SCHEMA)
.using("INSERT INTO articles.bigrams (docid, bigram) VALUES (?, ?)")
.build();
public static void load(String articleId, ArrayList<String> unigrams, ArrayList<String> bigrams) throws IOException, InvalidRequestException {
for (String unigram : unigrams) {
unigram_writer.addRow(unigram, articleId);
}
for (String bigram : bigrams) {
bigram_writer.addRow(bigram, articleId);
}
}
public static void closeWriter() throws IOException {
unigram_writer.close();
bigram_writer.close();
}
}
如果有效,这将开始在2个目录中创建SSTable文件。但是,我在运行时遇到此错误:
Exception in thread "Thread-1" java.lang.ExceptionInInitializerError
at edu.georgetown.cassandrabulktest.runnables.UnigramRunnable.run(UnigramRunnable.java:69)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 662e2edf-c864-34a4-bca6-f83b25af6f6a; expected 7247b490-b141-11e4-a8f9-8b65543eda40)
at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1125)
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:337)
at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.forTable(CQLSSTableWriter.java:360)
at edu.georgetown.cassandrabulktest.cassandra.UnigramLoader.<clinit>(UnigramLoader.java:29)
... 2 more
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 662e2edf-c864-34a4-bca6-f83b25af6f6a; expected 7247b490-b141-11e4-a8f9-8b65543eda40)
at org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1208)
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1140)
at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1121)
... 5 more
有没有办法做到这一点,还是有不同的方法来完成我想做的事情?提前谢谢!
答案 0 :(得分:0)
您可能想尝试构建和使用单个编写器实例,因为在同时使用多个编写器时似乎存在一些竞争条件。