Elasticsearch返回旧的SQL结果

时间:2015-06-07 15:30:33

标签: sql-server elasticsearch

我已经构建了一个从SQL Server中的表中提取数据的索引

{
    "type":"jdbc",
    "jdbc": 
    {
        "driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
        "url":"jdbc:sqlserver://[my_ip];databaseName=mega",
        "user":"sa","password":"******",
        "sql":"SELECT [OrderID],[CustomerName],[UserFullName],[Status]  FROM [Orders_Table]",
        "poll":"5s",
        "index": "mega",
        "type": "orders_search",
        "schedule" : "0 0-59 0-23 ? * *"
    }
}

问题是我收到了不相关的查询结果。

例如:[ 5220668 ]是数据库中只包含一次的行键。

{

    "from" : 0, "size" : 5,
    "query": { 
        "multi_match": {
           "query": "5220668", 
           "fields": [ "_all" ]
        }
    } 
}

结果:结果有问题。 我期待在数据库中只看到一个命中。而是查询检索行状态的整个生命周期

{
   "took": 12,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": null,
      "hits": [
         {
            "_index": "mega",
            "_type": "handledorders_search",
            "_id": "AU3OlBkh6JN7xIrOkzjm",
            "_score": null,
            "_source": {
               "Status": "NEW",
               "Date": "2015-06-07T03:00:12.110Z",
               "UserFullName": "my name",
               "CustomerName": "cust name",
               "OrderID": 5220668
            },
            "sort": [
               1433646012110
            ]
         },
         {
            "_index": "mega",
            "_type": "handledorders_search",
            "_id": "AU3Ok0E-6JN7xIrOkvpF",
            "_score": null,
            "_source": {
               "Status": "NEW",
               "Date": "2015-06-07T03:00:12.110Z",
               "UserFullName": "my name",
               "CustomerName": "cust name",
               "OrderID": 5220668
            },
            "sort": [
               1433646012110
            ]
         },
         {
            "_index": "mega",
            "_type": "handledorders_search",
            "_id": "AU3Ole0-6JN7xIrOk7Yo",
            "_score": null,
            "_source": {
               "Status": "FIX",
               "Date": "2015-06-07T03:00:12.110Z",
               "UserFullName": "my name",
               "CustomerName": "cust name",
               "OrderID": 5220668
            },
            "sort": [
               1433646012110
            ]
         },
         {
            "_index": "mega",
            "_type": "handledorders_search",
            "_id": "AU3OlQL86JN7xIrOk3eH",
            "_score": null,
            "_source": {
               "Status": "CLOSE",
               "Date": "2015-06-07T03:00:12.110Z",
               "UserFullName": "my name",
               "CustomerName": "cust name",
               "ExternalOrderID": 5220668
            },
            "sort": [
               1433646012110
            ]
         }
      ]
   }
}

1 个答案:

答案 0 :(得分:1)

我知道您正在使用_river插件或类似的东西,并且依赖于Elasticsearch轮询MSSQL数据。

棘手的部分是,当文档发生变化时,Elasticsearch不知道是否需要更新文档或创建新文档。你知道文件是一样的,但ES没有。您需要告诉ES文档是相同的。

有两种不同的方式。第一个是告诉ES特定字段是唯一标识符。您需要使用与

类似的内容创建映射
{
    "mega" : {
        "_id" : {
            "path" : "OrderId"
        }
    }
}

此方法自1.5.0以来已弃用

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html

另一种可能性是最简单的,它是在SQL初始化中将OrderId映射到_id。

更多信息http://blog.pluralsight.com/elasticsearch-and-sql-server

  

带有别名的select语句告诉SQL Server的方式   将主键字段“ID”返回为“_id”。这是默认密钥   Elasticsearch用于所有文档的约定。这一点很重要   在选择数据时保持这种术语   Elasticsearch知道更新文档而不是创建新文档   每次民意调查