DSE Cassandra - 为什么Astyanax比DataStax java驱动程序更快

时间:2016-10-10 19:33:07

标签: java cassandra datastax-java-driver astyanax

我正在使用com.netflix.astyanax:astyanax-core:1.56.44切换Java应用程序到com.datastax.cassandra:cassandra-driver-core:3.1.0。

马上做了一个简单的测试,用一个随机生成的密钥插入一行,然后读取该行1000次,与使用Astyanax的代码相比,我看到了糟糕的性能。只使用本地运行的单节点Cassandra实例。我正在测试的表很简单 - 只是一个blob主键uuid列和一个int日期列。

以下是DataStax驱动程序的基本代码:

class DataStaxCassandra
{
    final Session session;
    final PreparedStatement preparedIDWriteCmd;
    final PreparedStatement preparedIDReadCmd;

    void DataStaxCassandra()
    {
        final PoolingOptions poolingOptions = new PoolingOptions()
            .setConnectionsPerHost(HostDistance.LOCAL, 1, 2)
            .setConnectionsPerHost(HostDistance.REMOTE, 1, 1)
            .setMaxRequestsPerConnection(HostDistance.LOCAL, 128)
            .setMaxRequestsPerConnection(HostDistance.REMOTE, 128)
            .setPoolTimeoutMillis(0); // Don't ever wait for a connection to one host.

        final QueryOptions queryOptions = new QueryOptions()
            .setConsistencyLevel(ConsistencyLevel.LOCAL_ONE)
            .setPrepareOnAllHosts(true)
            .setReprepareOnUp(true);

        final LoadBalancingPolicy dcAwareRRPolicy = DCAwareRoundRobinPolicy.builder()
            .withLocalDc("my_laptop")
            .withUsedHostsPerRemoteDc(0)
            .build();

        final LoadBalancingPolicy loadBalancingPolicy = new TokenAwarePolicy(dcAwareRRPolicy);

        final SocketOptions socketOptions = new SocketOptions()
        .setConnectTimeoutMillis(1000)
        .setReadTimeoutMillis(1000);

        final RetryPolicy retryPolicy = new LoggingRetryPolicy(DefaultRetryPolicy.INSTANCE);

        Cluster.Builder clusterBuilder = Cluster.builder()
            .withClusterName("test cluster")
            .withPort(9042)
            .addContactPoints("127.0.0.1")
            .withPoolingOptions(poolingOptions)
            .withQueryOptions(queryOptions)
            .withLoadBalancingPolicy(loadBalancingPolicy)
            .withSocketOptions(socketOptions)
            .withRetryPolicy(retryPolicy);

        // I've tried both V3 and V2, with lower connections/host and higher reqs/connection settings
        // with V3, and it doesn't noticably affect the test performance. Leaving it at V2 because the
        // Astyanax version is using V2.
        clusterBuilder.withProtocolVersion(ProtocolVersion.V2);

        final Cluster cluster = clusterBuilder.build();
        session = cluster.connect();

        preparedIDWriteCmd = session.prepare(
            "INSERT INTO \"mykeyspace\".\"mytable\" (\"uuid\", \"date\") VALUES (?, ?) USING TTL 38880000");

        preparedIDReadCmd = session.prepare(
            "SELECT \"date\" from \"mykeyspace\".\"mytable\" WHERE \"uuid\"=?");
    }

    public List<Row> execute(final Statement statement, final int timeout)
    throws InterruptedException, ExecutionException, TimeoutException
    {
        final ResultSetFuture future = session.executeAsync(statement);

        try
        {
            final ResultSet readRows = future.get(timeout, TimeUnit.MILLISECONDS);
            final List<Row> resultRows = new ArrayList<>();

            // How far we can go without triggering the blocking fetch:
            int remainingInPage = readRows.getAvailableWithoutFetching();
            for (final Row row : readRows)
            {
                resultRows.add(row);
                if (--remainingInPage == 0) break;
            }
            return resultRows;
        }
        catch (final TimeoutException e)
        {
            future.cancel(true);
            throw e;
        }
    }

    private void insertRow(final byte[] id, final int date)
    throws InterruptedException, ExecutionException, TimeoutException
    {
        final ByteBuffer idKey = ByteBuffer.wrap(id);
        final BoundStatement writeCmd = preparedIDWriteCmd.bind(idKey, date);
        writeCmd.setRoutingKey(idKey);
        execute(writeCmd, 1000);
    }

    public int readRow(final byte[] id)
    throws InterruptedException, ExecutionException, TimeoutException
    {
        final ByteBuffer idKey = ByteBuffer.wrap(id);
        final BoundStatement readCmd = preparedIDReadCmd.bind(idKey);
        readCmd.setRoutingKey(idKey);
        final List<Row> idRows = execute(readCmd, 1000);

        if (idRows.isEmpty()) return 0;

        final Row idRow = idRows.get(0);
        return idRow.getInt("date");
    }
}

void perfTest()
{
    final DataStaxCassandra ds = new DataStaxCassandra();
    final int perfTestCount = 10000;

    final long startTime = System.nanoTime();
    for (int i = 0; i < perfTestCount; ++i)
    {
        final String id = UUIDUtils.generateRandomUUIDString();
        final byte[] idBytes = Utils.hexStringToByteArray(id);
        final int date = (int)(System.currentTimeMillis() / 1000);

        try
        {
            ds.insertRow(idBytes, date);
            final int dateRead = ds.readRow(idBytes);
            assert(dateRead == date) : "Inserted ID with date " +date +" but date read is " +dateRead;
        }
        catch (final InterruptedException | ExecutionException | TimeoutException e)
        {
            System.err.println("ERROR reading ID (test " +(i+1) +") - " +e.toString());
        }
    }
    System.out.println(
        perfTestCount +" insert+reads took " +
        TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime) +" ms");
}

我做错了会产生不良表现吗?鉴于我使用的是Astyanax的旧版本,我希望它能提升速度。

我已经尝试不用TokenAwarePolicy包装负载均衡策略,并且摆脱“setRoutingKey”行,只是因为我知道这些事情当我正在使用单个节点时肯定不应该帮助。

我的本​​地Cassandra版本是2.1.15(支持原生协议V3),但我们生产环境中的机器运行的是Cassandra 2.0.12.156(仅支持V2)。

请记住,这是针对具有一堆节点和多个数据中心的环境,这就是为什么我按照我的方式进行设置(从配置文件中设置实际值),甚至虽然我知道这个测试我可以跳过使用像DCAwareRoundRobinPolicy这样的东西。

任何帮助将不胜感激!我也可以发布使用Astyanax的代码,我首先想到的是确保我的新代码没有任何明显错误。 谢谢!

使用DataStax驱动程序进行10,000次写入+读取测试大约需要30秒,而使用Astyanax时,测试时间为15-20秒。

我将测试计数提高到100,000,看看是否有一些DataStax驱动程序的开销在启动时只消耗了大约10秒,之后它们可能会执行更相似的操作。但即使有100,000次读/写:

AstyanaxCassandra 100,000次插入+读取耗时156593 ms

DataStaxCassandra 100,000次插入+读取耗时294340 ms

0 个答案:

没有答案