我的数据集超过一百万行。我使用logstash将elasticsearch与Mysql集成在一起。 当我输入以下URL以在邮递员中获取时,
http://localhost:9200/persondetails/Document/_search?q= *
我得到以下内容:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "persondetails",
"_type": "Document",
"_id": "%{idDocument}",
"_score": 1,
"_source": {
"iddocument": 514697,
"@timestamp": "2017-08-31T05:18:46.916Z",
"author": "vaibhav",
"expiry_date": null,
"@version": "1",
"description": "ly that",
"creation_date": null,
"type": 1
}
},
{
"_index": "persondetails",
"_type": "Document_count",
"_id": "AV4o0J3OJ5ftvuhV7i0H",
"_score": 1,
"_source": {
"query": {
"term": {
"author": "rishav"
}
}
}
}
]
}
}
这是错误的,因为我的表中的行数超过100万,这表明总数只有2.我无法找到这里的错误。
当我输入http://localhost:9200/_cat/indices?v时 它显示了这个
健康:黄色
状态:打开
index:persondetails
uuid:4FiGngZcQfS0Xvu6IeHIfg
pri:5
rep:1
docs.count:2
docs.deleted:1054
store.size:125.4kb
pri.store.size:125.4kb
这是我的logstash.conf文件
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/persondetails"
jdbc_user => "root"
jdbc_password => ""
schedule => "* * * * *"
jdbc_validate_connection => true
jdbc_driver_library => "/usr/local/Cellar/logstash/5.5.2/mysql-connector-java-3.1.14/mysql-connector-java-3.1.14-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * FROM Document"
type => "persondetails"
}
}
output {
elasticsearch {
#protocol=>http
index =>"persondetails"
document_type => "Document"
document_id => "%{idDocument}"
hosts => ["http://localhost:9200"]
stdout{ codec => rubydebug}
}
}
答案 0 :(得分:1)
从您的结果看,您的logstash配置存在问题,导致您的文档被覆盖,因为没有生成document_id,并且索引中只有一个文档ID为&# 34;%{idDocument}"
从结果中查看以下_source片段到您提供的搜索查询:
"_source": {
"iddocument": 514697,
"@timestamp": "2017-08-31T05:18:46.916Z",
"author": "vaibhav",
"expiry_date": null,
"@version": "1",
"description": "ly that",
"creation_date": null,
"type": 1
}
即使查看索引的小尺寸,也看起来没有更多文档。您应该查看您的jdbc输入是否提供了" idDocument"领域。