Question

我想在cassandra中过滤下表中的行。

CREATE TABLE mids_test_db.defect_data (
    wafer_id text,
    defect_id text,
    document_id text,
    fields list<double>,
    PRIMARY KEY (wafer_id, defect_id)
) 
...
CREATE INDEX defect_data_fields_idx ON mids_test_db.defect_data (values(fields));

我首先尝试使用类似field[0] > 0.5之类的方法，但失败了。

cqlsh:mids_test_db> select fields from  defect_data where  wafer_id =  'MIDS_1_20170101_023000_30000_1548100671' and fields[0] > 0.5;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Indexes on list entries (fields[index] = value) are not currently supported."

搜索了一段时间之后，我觉得在Cassandra中无法轻松完成这种工作。数据模型类似于字段值集合。我大多想使用上面的defect数据来查询fields数据，这在我的业务中非常重要。

我应该考虑哪种方法？应用程序端过滤？任何提示或建议将不胜感激。

Answer 1

不可能直接与Cassandra一起使用，但是有以下几种选择：

如果您的Cassandra是DataStax Enterprise，则可以使用DSE Search;
您可以添加其他表来执行查找：

（...忽略此行...）

CREATE TABLE mids_test_db.defect_data_lookup (
    wafer_id text,
    defect_id text,
    field double,
    PRIMARY KEY (wafer_id, field, defect_id)
);

此后，您应该能够在分区内进行范围扫描，并且至少要获取defect_id字段，并通过第二个查询来获取所有字段值。

根据您的Cassandra版本，您也许可以使用实例化视图为您维护该查找表。

按“ list <double>”列的元素值范围过滤

1 个答案: