我正在使用com.netflix.astyanax:astyanax-core:1.56.44切换Java应用程序到com.datastax.cassandra:cassandra-driver-core:3.1.0。
马上做了一个简单的测试,用一个随机生成的密钥插入一行,然后读取该行1000次,与使用Astyanax的代码相比,我看到了糟糕的性能。只使用本地运行的单节点Cassandra实例。我正在测试的表很简单 - 只是一个blob主键uuid列和一个int日期列。
以下是DataStax驱动程序的基本代码:
class DataStaxCassandra
{
final Session session;
final PreparedStatement preparedIDWriteCmd;
final PreparedStatement preparedIDReadCmd;
void DataStaxCassandra()
{
final PoolingOptions poolingOptions = new PoolingOptions()
.setConnectionsPerHost(HostDistance.LOCAL, 1, 2)
.setConnectionsPerHost(HostDistance.REMOTE, 1, 1)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 128)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 128)
.setPoolTimeoutMillis(0); // Don't ever wait for a connection to one host.
final QueryOptions queryOptions = new QueryOptions()
.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE)
.setPrepareOnAllHosts(true)
.setReprepareOnUp(true);
final LoadBalancingPolicy dcAwareRRPolicy = DCAwareRoundRobinPolicy.builder()
.withLocalDc("my_laptop")
.withUsedHostsPerRemoteDc(0)
.build();
final LoadBalancingPolicy loadBalancingPolicy = new TokenAwarePolicy(dcAwareRRPolicy);
final SocketOptions socketOptions = new SocketOptions()
.setConnectTimeoutMillis(1000)
.setReadTimeoutMillis(1000);
final RetryPolicy retryPolicy = new LoggingRetryPolicy(DefaultRetryPolicy.INSTANCE);
Cluster.Builder clusterBuilder = Cluster.builder()
.withClusterName("test cluster")
.withPort(9042)
.addContactPoints("127.0.0.1")
.withPoolingOptions(poolingOptions)
.withQueryOptions(queryOptions)
.withLoadBalancingPolicy(loadBalancingPolicy)
.withSocketOptions(socketOptions)
.withRetryPolicy(retryPolicy);
// I've tried both V3 and V2, with lower connections/host and higher reqs/connection settings
// with V3, and it doesn't noticably affect the test performance. Leaving it at V2 because the
// Astyanax version is using V2.
clusterBuilder.withProtocolVersion(ProtocolVersion.V2);
final Cluster cluster = clusterBuilder.build();
session = cluster.connect();
preparedIDWriteCmd = session.prepare(
"INSERT INTO \"mykeyspace\".\"mytable\" (\"uuid\", \"date\") VALUES (?, ?) USING TTL 38880000");
preparedIDReadCmd = session.prepare(
"SELECT \"date\" from \"mykeyspace\".\"mytable\" WHERE \"uuid\"=?");
}
public List<Row> execute(final Statement statement, final int timeout)
throws InterruptedException, ExecutionException, TimeoutException
{
final ResultSetFuture future = session.executeAsync(statement);
try
{
final ResultSet readRows = future.get(timeout, TimeUnit.MILLISECONDS);
final List<Row> resultRows = new ArrayList<>();
// How far we can go without triggering the blocking fetch:
int remainingInPage = readRows.getAvailableWithoutFetching();
for (final Row row : readRows)
{
resultRows.add(row);
if (--remainingInPage == 0) break;
}
return resultRows;
}
catch (final TimeoutException e)
{
future.cancel(true);
throw e;
}
}
private void insertRow(final byte[] id, final int date)
throws InterruptedException, ExecutionException, TimeoutException
{
final ByteBuffer idKey = ByteBuffer.wrap(id);
final BoundStatement writeCmd = preparedIDWriteCmd.bind(idKey, date);
writeCmd.setRoutingKey(idKey);
execute(writeCmd, 1000);
}
public int readRow(final byte[] id)
throws InterruptedException, ExecutionException, TimeoutException
{
final ByteBuffer idKey = ByteBuffer.wrap(id);
final BoundStatement readCmd = preparedIDReadCmd.bind(idKey);
readCmd.setRoutingKey(idKey);
final List<Row> idRows = execute(readCmd, 1000);
if (idRows.isEmpty()) return 0;
final Row idRow = idRows.get(0);
return idRow.getInt("date");
}
}
void perfTest()
{
final DataStaxCassandra ds = new DataStaxCassandra();
final int perfTestCount = 10000;
final long startTime = System.nanoTime();
for (int i = 0; i < perfTestCount; ++i)
{
final String id = UUIDUtils.generateRandomUUIDString();
final byte[] idBytes = Utils.hexStringToByteArray(id);
final int date = (int)(System.currentTimeMillis() / 1000);
try
{
ds.insertRow(idBytes, date);
final int dateRead = ds.readRow(idBytes);
assert(dateRead == date) : "Inserted ID with date " +date +" but date read is " +dateRead;
}
catch (final InterruptedException | ExecutionException | TimeoutException e)
{
System.err.println("ERROR reading ID (test " +(i+1) +") - " +e.toString());
}
}
System.out.println(
perfTestCount +" insert+reads took " +
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime) +" ms");
}
我做错了会产生不良表现吗?鉴于我使用的是Astyanax的旧版本,我希望它能提升速度。
我已经尝试不用TokenAwarePolicy包装负载均衡策略,并且摆脱“setRoutingKey”行,只是因为我知道这些事情当我正在使用单个节点时肯定不应该帮助。
我的本地Cassandra版本是2.1.15(支持原生协议V3),但我们生产环境中的机器运行的是Cassandra 2.0.12.156(仅支持V2)。
请记住,这是针对具有一堆节点和多个数据中心的环境,这就是为什么我按照我的方式进行设置(从配置文件中设置实际值),甚至虽然我知道这个测试我可以跳过使用像DCAwareRoundRobinPolicy这样的东西。
任何帮助将不胜感激!我也可以发布使用Astyanax的代码,我首先想到的是确保我的新代码没有任何明显错误。 谢谢!
使用DataStax驱动程序进行10,000次写入+读取测试大约需要30秒,而使用Astyanax时,测试时间为15-20秒。
我将测试计数提高到100,000,看看是否有一些DataStax驱动程序的开销在启动时只消耗了大约10秒,之后它们可能会执行更相似的操作。但即使有100,000次读/写:
AstyanaxCassandra 100,000次插入+读取耗时156593 ms
DataStaxCassandra 100,000次插入+读取耗时294340 ms