I am trying to compare the Cassandra Lucene Secondary indexes. The Stratio Lucene Index and SASIIndex.
When I am trying to query data using secondary index created using SASI, the LIKE queries does not return all the results.
Steps to reproduce.
CREATE TABLE movies_test (
id UUID,
title TEXT,
year INT,
runtime INT,
PRIMARY KEY ((id))
);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death in Vientiane', 2017, 16);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death at Preah Vihear', 2014, 51);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death of Manuel de Falla', 1991, 84);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death at the Ambassador Hotel', 1994, 51);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death in Flanders', 1963, 82);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death of the Hollywood Kid', 1994, 114);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death in Iraq', 2009, 58);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Debt', 2001, 80);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death of the Army of Northern Virginia', 2008, 50);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death on the A List', 1996, 50);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death', 1980, 83);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death', 1977, 91);
INSERT INTO movies_test (id, title, year, runtime) values (uuid(), 'Life and Death', 1936, 90);
CREATE CUSTOM INDEX title ON movies_test (title) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
SELECT title, year, runtime FROM movies WHERE title like 'Life and De%';
title | year | runtime
-------------------------------------------------+------+---------
Life and Death in Vientiane | 2017 | 16
Life and Death at Preah Vihear | 2014 | 51
Life and Death of Manuel de Falla | 1991 | 84
Life and Death at the Ambassador Hotel | 1994 | 51
Life and Death in Flanders | 1963 | 82
Life and Death of the Hollywood Kid | 1994 | 114
Life and Death in Iraq | 2009 | 58
Life and Debt | 2001 | 80
Life and Death of the Army of Northern Virginia | 2008 | 50
Life and Death on the A List | 1996 | 50
The above query does not return the following 3 movies.
title | year | runtime
----------------+------+---------
Life and Death | 1980 | 83
Life and Death | 1977 | 91
Life and Death | 1936 | 90
If I try do the similar query using Stratio Lucene Index, I am able to query the complete data.
CREATE CUSTOM INDEX movies_test_index ON movies_test ()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds': '1',
'schema': '{
fields: {
title: {type: "string"}
}
}'
};
SELECT title, year, runtime FROM movies_test WHERE expr(movies_test_index, '{
query: {type: "prefix", field: "title", value: "Life and De"}
}');
title | year | runtime
-------------------------------------------------+------+---------
Life and Death at the Ambassador Hotel | 1994 | 51
Life and Death at Preah Vihear | 2014 | 51
Life and Death in Vientiane | 2017 | 16
Life and Death | 1980 | 83
Life and Death on the A List | 1996 | 50
Life and Death | 1936 | 90
Life and Death in Flanders | 1963 | 82
Life and Death of Manuel de Falla | 1991 | 84
Life and Death of the Hollywood Kid | 1994 | 114
Life and Death of the Army of Northern Virginia | 2008 | 50
Life and Death | 1977 | 91
Life and Death in Iraq | 2009 | 58
Life and Debt | 2001 | 80
I am not sure, whether I am making any mistakes in creating the SASI index, or during the query.
Please help.
It would also be great, if anyone can suggest what are other options to do the secondary index in Apache Cassandra (Not the Datastax version).
I am currently testing using apache-cassandra-3.11.2.
Thanks