Elasticsearch:从索引文档中删除重复记录

时间:2015-03-05 12:24:58

标签: mysql elasticsearch elasticsearch-jdbc-river

这是我的JDBC river命令,用于从数据库中获取所有记录。

localhost:9200/_river/my_update_river/_meta
{
  "type" : "jdbc",
   "jdbc" : {
     "url" : "jdbc:mysql://localhost:3306/admin",
      "user" : "root",
      "password" : "",
      "poll" : "6s",
      "index" : "updateauto",
      "type" : "users",
      "schedule":"0/10 * * ? * *",
      "strategy" : "simple",
      "sql" : "select * from users"
    }
 }

当我运行此命令时:我有两个问题:

  1. 重复记录
  2. 当我在数据库中添加新记录时,它不会更新索引文档,而是通过

    进行搜索

    {    “查询”:{      “过滤”:{         “过滤器”:{             “term”:{“Name”:“testing”}         }     }    }  }

  3. 这是我的结果。

       {
         "took" : 4,
         "timed_out" : false,
          "_shards" : {
          "total" : 5,
          "successful" : 5,
          "failed" : 0
       },
         "hits" : {
         "total" : 37551,
          "max_score" : 1.0,
          "hits" : [ {
          "_index" : "updateauto",
          "_type" : "users",
          "_id" : "AUvjnNHmMKBTPrby96Jg",
          "_score" : 1.0,
          "_source":{"ID":23,"Name":"Abudul  Rafay","Email":"a","Password":"afasd"}
    }, {
          "_index" : "updateauto",
         "_type" : "users",
         "_id" : "AUvjnNHnMKBTPrby96Jk",
        "_score" : 1.0,
         "_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
    }, {
          "_index" : "updateauto",
          "_type" : "users",
           "_id" : "AUvjngk0MKBTPrby96Ka",
          "_score" : 1.0,
          "_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
    }, {
         "_index" : "updateauto",
         "_type" : "users",
         "_id" : "AUvjngk0MKBTPrby96Kf",
         " _score" : 1.0,
         "_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
    }, {
          "_index" : "updateauto",
          "_type" : "users",
         "_id" : "AUvjnjA0MKBTPrby96Kh",
         "_score" : 1.0,
         "_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
    }, {
         "_index" : "updateauto",
          "_type" : "users",
        "_id" : "AUvjnjA0MKBTPrby96Km",
        "_score" : 1.0,
        "_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
    },  {
        "_index" : "updateauto",
        "_type" : "users",
        "_id" : "AUvjnZP0MKBTPrby96KD",
        "_score" : 1.0,
        "_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
    }, {
        "_index" : "updateauto",
        "_type" : "users",
        "_id" : "AUvjnPe-MKBTPrby96Jq",
       "_score" : 1.0,
        "_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
    }, {
        "_index" : "updateauto",
        "_type" : "users",
       "_id" : "AUvjnR7NMKBTPrby96Ju",
        "_score" : 1.0,
        "_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
    }, {
        "_index" : "updateauto",
        "_type" : "users",
        "_id" : "AUvjnbuLMKBTPrby96KO",
        "_score" : 1.0,
        "_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
        } ]
       }
     }
    

    我想要没有重复记录的结果,也会自动更新。

1 个答案:

答案 0 :(得分:1)

我没有完全理解你的第二个问题,但考虑到这里的重复问题是你需要做的事情:

您需要在河流定义中指定文档的ID,如下所示:

"sql" : "select *, ID as _id from user"

通过这种方式,河流只会写出每个用户都在想它的身份。