Question

我的数据集超过一百万行。我使用logstash将elasticsearch与Mysql集成在一起。当我输入以下URL以在邮递员中获取时，

http://localhost:9200/persondetails/Document/_search?q= *

我得到以下内容：

{
"took": 1,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
        {
            "_index": "persondetails",
            "_type": "Document",
            "_id": "%{idDocument}",
            "_score": 1,
            "_source": {
                "iddocument": 514697,
                "@timestamp": "2017-08-31T05:18:46.916Z",
                "author": "vaibhav",
                "expiry_date": null,
                "@version": "1",
                "description": "ly that",
                "creation_date": null,
                "type": 1
            }
        },
        {
            "_index": "persondetails",
            "_type": "Document_count",
            "_id": "AV4o0J3OJ5ftvuhV7i0H",
            "_score": 1,
            "_source": {
                "query": {
                    "term": {
                        "author": "rishav"
                    }
                }
            }
        }
    ]
}

}

这是错误的，因为我的表中的行数超过100万，这表明总数只有2.我无法找到这里的错误。

当我输入http://localhost:9200/_cat/indices?v时它显示了这个

健康：黄色
状态：打开
index：persondetails
uuid：4FiGngZcQfS0Xvu6IeHIfg
pri：5
rep：1
docs.count：2
docs.deleted：1054
store.size：125.4kb
pri.store.size:125.4kb

这是我的logstash.conf文件

input {
jdbc {
    jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/persondetails"
    jdbc_user => "root"
    jdbc_password => ""
    schedule => "* * * * *"
    jdbc_validate_connection => true
    jdbc_driver_library => "/usr/local/Cellar/logstash/5.5.2/mysql-connector-java-3.1.14/mysql-connector-java-3.1.14-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    statement => "SELECT * FROM Document"
    type => "persondetails"
}
}
output {
elasticsearch {
    #protocol=>http
    index =>"persondetails"
    document_type => "Document"
    document_id => "%{idDocument}"
    hosts => ["http://localhost:9200"]
    stdout{ codec => rubydebug}
}
}

Answer 1

从您的结果看，您的logstash配置存在问题，导致您的文档被覆盖，因为没有生成document_id，并且索引中只有一个文档ID为＆＃ 34;％{idDocument}＆＃34;

从结果中查看以下_source片段到您提供的搜索查询：

"_source": {
            "iddocument": 514697,
            "@timestamp": "2017-08-31T05:18:46.916Z",
            "author": "vaibhav",
            "expiry_date": null,
            "@version": "1",
            "description": "ly that",
            "creation_date": null,
            "type": 1
}

即使查看索引的小尺寸，也看起来没有更多文档。您应该查看您的jdbc输入是否提供了＆＃34; idDocument＆＃34;领域。

搜索时显示错误的数据

1 个答案: