Question

目前，我有一个通过DSE图形引擎存储的图表，其中包含100K节点。这些节点具有标签“customer”和名为“age”的属性，该属性允许整数值。我使用以下命令为此属性编制了索引：

schema.vertexLabel("customer").index("custByAge").secondary().by("age").add()

我希望能够使用此索引来回答在特定年龄范围内寻找客户的查询（例如，10到20岁之间的“年龄”）。但是，当我按年龄间隔查询客户时，似乎实际上并没有使用我创建的索引。

当我提交以下查询时，在大约40ms内返回一个顶点列表，这使我相信该索引正在被使用：

g.V().has('customer','age',15)

但是当我提交以下查询时，查询会在30秒后超时（正如我在配置中指定的那样）：

g.V().has('customer','age',inside(10,20))
Interruption of result iteration
Display stack trace? [yN]

这让我相信索引没有用于此查询。那似乎对吗？如果索引没有被使用，有没有人对如何加快这个查询有一些建议？

修改正如下面的答案所示，我在上述每个查询中都运行了.profile，结果如下（仅显示相关信息）：

gremlin> g.V().has('customer','age',21).profile()
==>Traversal Metrics
...
  index-query                    14.333ms

gremlin> g.V().has('customer','age',inside(21,23)).profile()
==>Traversal Metrics
...
   index-query                    115.055ms
   index-query                    132.144ms
   index-query                    132.842ms
   >TOTAL                       53042.171ms

这给我留下了几个问题：

.profile（）返回index-query的事实是否意味着我的第二个查询正在使用索引？
为什么第二个查询有3个索引查询，而第一个查询则为1？
对于第二个查询，所有索引查询总计约为约400毫秒。为什么整个查询需要~50000ms？除了这些索引查询之外，.profile（）命令不会显示任何其他需要时间的内容，那么额外的50000ms来自哪里？

Answer 1

您使用的是DataStax Studio吗？如果是这样，您可以使用.profile（）功能来了解索引的参与方式？

示例.profile（）使用： g.V（）。in（）。has（'name'，'Julia Child'）。count（）。profile（）

Answer 2

你想在这种情况下使用搜索索引，它会快得多。

例如，在KillRVideo中：

schema.vertexLabel("movie").index("search").search().by("year").add()

g.V().hasLabel('movie').has('year', gt(2000)).has('year', lte(2017)).profile()

然后从Studio profile（）我们可以看到：

SELECT "community_id", "member_id" FROM "killrvideo"."movie_p" WHERE 
"solr_query" = '{"q":"*:*", "fq":["year:{2000 TO *}","year:{* TO 
2017]"]}' LIMIT ?; with params (java.lang.Integer) 50000

默认情况下，探查器不显示所有操作的跟踪，因此您看到的索引查询列表可能会被截断。根据此文档修改“max_profile_events”：https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/graph/reference/schema/refSchemaConfig.html

整数区间的DSE图索引

2 个答案: