Question

我正在查看Astyanax文档中的数据阅读配方和示例。其中一些（即使用回调查询所有行）包括

setRepeatLastToken(false)

有人可以解释这是用来做什么的吗？我应该什么时候使用它？它看起来默认为（true）。

链接到javadoc：http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/query/AllRowsQuery.html#setRepeatLastToken(boolean)

com.netflix.astyanax.query.AllRowsQuery的源代码包含以下注释：

 * There are a few important implementation details that need to be considered.
 * This implementation assumes the random partitioner is used. Consequently the
 * KeyRange query is done using tokens and not row keys. This is done because
 * when using the random partitioner tokens are sorted while keys are not.
 * However, because multiple keys could potentially map to the same token each
 * incremental query to Cassandra will repeat the last token from the previous
 * response. This will ensure that no keys are skipped. This does however have
 * to very important implications. First, the last and potentially more (if they
 * have the same token) row keys from the previous response will repeat. Second,
 * if a range of repeating tokens is larger than the block size then the code
 * will enter an infinite loop. This can be mitigated by selecting a block size
 * that is large enough so that the likelyhood of this happening is very low.
 * Also, if your application can tolerate the potential for skipped row keys
 * then call setRepeatLastToken(false) to turn off this features.

我理解查询是基于令牌范围而不是键范围完成的。但是，如果不重复令牌，为什么会跳过行？

Answer 1

源代码注释几乎解释了setRepeatLastToken（boolean）的功能。但这里有详细信息：

根据this帖子，Cassandra使用MD5或MurMurHash（取决于Cassandra版本）算法从密钥生成令牌。这两种算法都很快但可以产生冲突（不同密钥的标记值相同）。因此，可能存在多个存储在同一令牌下的行（通常，如果数据集足够大）。

Cassandra根据令牌将数据存储在节点上。使用随机分区程序时，数据检索按令牌顺序（不是键顺序）完成。这是有道理的，因为记录是按顺序从同一节点读取的，并且比从集群中的随机节点检索记录产生的流量更少。

当使用Astyanax从Cassandra读取页面时，页面（块）大小可能对应于具有相同标记的一组行的中间。当下一页的请求到来时，Astyanax需要知道是否从下一个令牌开始（可能会错过剩余的行，其中包含与不适合最后一页相同的令牌）或重复最后一个令牌以确保读取最后一个键中的所有行（但重复上一页中的一行或可能更多行）。

代码注释还警告说，如果页面大小足够小，只有具有相同令牌的行适合它，如果setRepeatLastToken设置为true，代码可能会进入无限循环。

我希望这可以帮助其他可能想知道此功能的人。

方法setRepeatLastToken在Astyanax中做了什么？

1 个答案: