我在文件中有5,000,000个插入查询。我想从文件中读取它们并使用java驱动程序和executeAsync方法写入cassandra,在循环语句中如下代码:
public static void main(String[] args) {
FileReader fr = null;
try {
fr = new FileReader("the-file-name.txt");
BufferedReader br = new BufferedReader(fr);
String sCurrentLine;
long time1 = System.currentTimeMillis();
while ((sCurrentLine = br.readLine()) != null) {
session.executeAsync(sCurrentLine);
}
System.out.println(System.currentTimeMillis() - time1);
fr.close();
br.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
我的表定义是:
CREATE TABLE test.climate (
city text,
date text,
time text,
temprature int,
PRIMARY KEY ((city, date), time)
) WITH CLUSTERING ORDER BY (time ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
但是在运行程序后,表中的行数是2,569,725
cqlsh:test> select count(*) from climate ;
count
---------
2569725
我测试了10次以上,每次选择计数(*)的结果都在2,400,00和2,600,000之间
答案 0 :(得分:1)
您发出的异步插入速度比执行速度快,因此最终会超出队列大小并失败。你可以增加你的队列大小,但是你只需要向记忆而不是你的制作人施加压力,但仍然可能会撞墙。尝试限制飞行中的查询,例如:
public static void main2(String[] args) {
FileReader fr = null;
int permits = 256;
Semaphore l = new Semaphore(permits);
try {
fr = new FileReader("the-file-name.txt");
BufferedReader br = new BufferedReader(fr);
String sCurrentLine;
long time1 = System.currentTimeMillis();
while ((sCurrentLine = br.readLine()) != null) {
l.acquire();
session.executeAsync(sCurrentLine)
.addListener(()->l.release(), MoreExecutors.directExecutor());
}
l.acquire(permits);
System.out.println(System.currentTimeMillis() - time1);
fr.close();
br.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
它可能会以同样快的速度运行,只需要找到正确大小的信号量。还要注意阻塞,直到所有许可都被返回(在结束时获取最大值),否则你可以在发送可能在队列中的请求之前关闭jvm。
免责声明:我没有测试上面的代码