我在AWS上设置了一个Cassandra集群。我想要的是增加更多节点(如所宣传的),增加I / O吞吐量(每秒读/写次数)。但是,我恰恰相反。随着新节点的添加,性能会降低。
您是否知道阻止其缩放的任何典型问题?
以下是一些细节:
我正在为列族添加一个文本文件(15MB)。每一行都是一条记录。有150000条记录。当有1个节点时,写入大约需要90秒。但是当有2个节点时,需要120秒。我可以看到数据传播到2个节点。但是,吞吐量没有增加。
源代码如下:
public class WordGenCAS {
static final String KEYSPACE = "text_ks";
static final String COLUMN_FAMILY = "text_table";
static final String COLUMN_NAME = "text_col";
public static void main(String[] args) throws Exception {
if (args.length < 2) {
System.out.println("Usage: WordGenCAS <input file> <host1,host2,...>");
System.exit(-1);
}
String[] contactPts = args[1].split(",");
Cluster cluster = Cluster.builder()
.addContactPoints(contactPts)
.build();
Session session = cluster.connect(KEYSPACE);
InputStream fis = new FileInputStream(args[0]);
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
BufferedReader br = new BufferedReader(in);
String line;
int lineCount = 0;
while ( (line = br.readLine()) != null) {
line = line.replaceAll("'", " ");
line = line.trim();
if (line.isEmpty())
continue;
System.out.println("[" + line + "]");
String cqlStatement2 = String.format("insert into %s (id, %s) values (%d, '%s');",
COLUMN_FAMILY,
COLUMN_NAME,
lineCount,
line);
session.execute(cqlStatement2);
lineCount++;
}
System.out.println("Total lines written: " + lineCount);
}
}
数据库架构如下:
CREATE KEYSPACE text_ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
USE text_ks;
CREATE TABLE text_table (
id int,
text_col text,
primary key (id)
) WITH COMPACT STORAGE;
谢谢!
答案 0 :(得分:4)
即使这是一篇旧帖子,我认为值得为这些(常见)问题发布解决方案。
正如您已经发现的那样,使用 serial 过程加载数据的速度很慢。你被建议的是正确的做法。
但是,在不施加某种背压的情况下发出大量查询可能会导致问题,并且由于服务器(以及驱动程序在某种程度上)过度过载,您将丢失数据。
此解决方案将使用异步调用加载数据,并尝试在客户端上施加一些背压以避免数据丢失。
public class WordGenCAS {
static final String KEYSPACE = "text_ks";
static final String COLUMN_FAMILY = "text_table";
static final String COLUMN_NAME = "text_col";
public static void main(String[] args) throws Exception {
if (args.length < 2) {
System.out.println("Usage: WordGenCAS <input file> <host1,host2,...>");
System.exit(-1);
}
String[] contactPts = args[1].split(",");
Cluster cluster = Cluster.builder()
.addContactPoints(contactPts)
.build();
Session session = cluster.connect(KEYSPACE);
InputStream fis = new FileInputStream(args[0]);
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
BufferedReader br = new BufferedReader(in);
String line;
int lineCount = 0;
// This is the futures list of our queries
List<Future<ResultSet>> futures = new ArrayList<>();
// Loop
while ( (line = br.readLine()) != null) {
line = line.replaceAll("'", " ");
line = line.trim();
if (line.isEmpty())
continue;
System.out.println("[" + line + "]");
String cqlStatement2 = String.format("insert into %s (id, %s) values (%d, '%s');",
COLUMN_FAMILY,
COLUMN_NAME,
lineCount,
line);
lineCount++;
// Add the "future" returned by async method the to the list
futures.add(session.executeAsync(cqlStatement2));
// Apply some backpressure if we issued more than X query.
// Change X to another value suitable for your cluster
while (futures.size() > 1000) {
Future<ResultSet> future = futures.remove(0);
try {
future.get();
} catch (Exception e) {
e.printStackTrace();
}
}
}
System.out.println("Total lines written: " + lineCount);
System.out.println("Waiting for writes to complete...");
// Wait until all writes are done.
while (futures.size() > 0) {
Future<ResultSet> future = futures.remove(0);
try {
future.get();
} catch (Exception e) {
e.printStackTrace();
}
}
System.out.println("Done!");
}
}