Question

我正在编写一个Java客户端，它测试读取大小为一百万行的数据。我正在从列的映射中的键中过滤数据。代码正在正确创建和插入数据，但无法读取数据。我的代码是：

public class MillionMapTest {
    private Cluster cluster;
    private Session session;

    public void connect(String node) {
        cluster = Cluster.builder().addContactPoint(node).build();
        session = cluster.connect();
    }

    public void createSchema() {
        session.execute("CREATE KEYSPACE xx WITH replication " +
                "= {'class':'SimpleStrategy', 'replication_factor':3};");
        session.execute(
                "CREATE TABLE xx.events (" +
                        "log_time_local timeuuid," +
                        "username text," +
                        "log_type text," +
                        "log_time timestamp," +
                        "device_category text," +
                        "log text," +
                        "priority INT," +
                        "client_ip text," +
                        "backend_app text," +
                        "location_details map<text, text>," +
                        "device_details map<text, text>," +
                        "extra_info Blob," +
                        "PRIMARY KEY (log_time_local, username, log_type)" +
                ");");
        session.execute("CREATE INDEX devicekeys ON xx.events(KEYS(device_details));");
    }

    public void loadData() {
        PreparedStatement statement = session.prepare(
                "INSERT INTO xx.events VALUES (now(), ?, ?, toTimestamp(now()), ?, ?, ?, ?, ?, ?, ?, ?);");
        BoundStatement boundStatement = new BoundStatement(statement);
        for (int i=0; i<1000000; i++) {
            Map<String, String> tags = new HashMap<>();
            tags.put("os", "ios");
            tags.put("category", "tab");
            tags.put("dev_num", "12ABF847CA");
            if (i % 100 == 0) tags.put("category", "mobile");
            session.execute(boundStatement.bind("name_"+i,"type_"+i, "cat_"+i, "log_"+i, i, "ip_"+i, "app_"+i, null, tags, null));
        }
    }

    public void querySchema() {
        ResultSet results = session.execute("SELECT * FROM xx.events WHERE device_details['category'] = 'mobile' ALLOW FILTERING;");    
    }

    public static void main(String[] args) {
        MillionMapTest client = new MillionMapTest();
        client.connect("localhost");
        client.createSchema();
        client.loadData();
        client.querySchema();
        session.close();
        cluster.close();
    }
}

错误为com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded, 1 failed)。

在cqlsh运行时，查询正常运行，此代码使用了一些数据。但它并没有与一百万人合作。导致此错误的原因是什么？如何解决？

Answer 1

您的问题似乎是“二级索引”。它们不是C *中性能最高的东西，并且有自己的警告。有关C *中二级索引问题的一些很好的文档，例如这link。您在地图数据类型上也有二级索引。这将是缓慢的。你得到ReadFailureException而不是ReadTimeout的事实可能与你在查询它时索引不是最新的事情有关（我不太确定它但参考这个issue这是一个ReadFailureException可以的情况被抛出）。

我认为您应该考虑重新构建模式或对表进行非规范化，并且可能有办法进行密钥查找而不是依赖于二级索引。

Apache Cassandra读取数据会产生ReadFailureException

1 个答案: