Question

我有一张桌子：

CREATE TABLE my_table (
    user_id text,
    ad_id text,
    date timestamp,
    PRIMARY KEY (user_id, ad_id)
);

我使用的user_id和ad_id的长度不超过15个字符。

我像这样查询表格：

Set<String> users = ... filled somewhere 
Session session = ... builded somewhere
BoundStatement boundQuery = ... builded somewhere
(using query: "SELECT * FROM my_table WHERE user_id=?")

List<Row> rowAds = 
      users.stream()
          .map(user -> session.executeAsync(boundQuery.bind(user)))
          .map(ResultSetFuture::getUninterruptibly)
          .map(ResultSet::all)
          .flatMap(List::stream)
          .collect(toList());

用户组大约有3000个元素，每个用户大约有300个广告。

此代码在同一台机器中的50个线程中使用（使用不同的用户），（使用相同的Session对象）

算法需要2到3秒才能完成

Cassandra集群有3个节点，复制因子为2.每个节点有6个内核和12 GB内存。

Cassandra节点占CPU容量的60％，ram的33％，ram的66％（包括页面缓存）查询机器占其CPU容量的50％，ram的50％

如何将阅读时间缩短至不到1秒？

谢谢！

更新

在得到一些答案后（非常感谢），我意识到我并非如此。并行执行查询，因此我将代码更改为：

List<Row> rowAds = 
     users.stream()
       .map(user ->  session.executeAsync(boundQuery.bind(user)))
       .collect(toList())
       .stream()
       .map(ResultSetFuture::getUninterruptibly)
       .map(ResultSet::all)
       .flatMap(List::stream)
       .collect(toList());

所以现在查询是平行完成的，这给了我300毫秒的aprox时间，那里有很大的改进！ 但我的问题还在继续，它会更快吗？ 再次，谢谢！

Answer 1

users.stream()
          .map(user -> session.executeAsync(boundQuery.bind(user)))
          .map(ResultSetFuture::getUninterruptibly)
          .map(ResultSet::all)
          .flatMap(List::stream)
          .collect(toList());

一句话。在map()的第二个ResultSetFuture::getUninterruptibly，您正在呼叫ResultSetFuture。这是一个阻塞调用，所以你不会从异步执行中获益太多......

相反，尝试将驱动程序返回的Futures列表（提示：ListenableFuture正在实现Guava的<Label Text="Flag Background" BackgroundColor="{Binding Source={x:Reference switch3}, Path=IsToggled, Converter={StaticResource boolToColor}}"></Label>接口）转换为List of Future

请参阅：http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/Futures.html#successfulAsList(java.lang.Iterable)

如何使用异步查询提高cassandra 3.0读取性能和吞吐量？

1 个答案: