Question

我在DocDb中有一个时间戳列，我想在Azure数据工厂复制管道中进行查询，该管道将DocDb复制到Azure Data Lake

我想

select * from c
where c._ts > '@{pipeline().parameters.windowStart}'

但是我得到了

Errors":["An invalid query has been specified with filters against path(s) that are not range-indexed.

在DocDb政策中，我有

"includedPaths": [
    {
        "path": "/*",
        "indexes": [
            {
                "kind": "Range",
                "dataType": "Number",
                "precision": -1
            },
            {
                "kind": "Hash",
                "dataType": "String",
                "precision": 3
            }
        ]
    }
  ]

我认为这应该允许按范围查询_ts int64。

我哪里出错了？

谢谢。

Answer 1

我重现了您的SQL问题和索引策略问题。

根据我的观察，看来过滤器被视为String而不是Int。您可以在SQL中删除'，然后重试，它对我有用。

sql：

select * from c
where c._ts > @{pipeline().parameters.windowStart}

输出：

Answer 2

谢谢@Jay。

我最终使用了UDF

function dateTime2Epoch(dateTimeString){
    return Math.trunc(new Date(dateTimeString).getTime()/1000);
}

在Cosmos数据库中。然后在Azure Data Factory中进行查询

select * from c 
where c._ts >= udf.dateTime2Epoch('@{pipeline().parameters.windowStart}')
  and c._ts < udf.dateTime2Epoch('@{pipeline().parameters.windowEnd}')

但是，查询似乎非常缓慢。找到更多信息后，我将对其进行更新。

更新：最终复制了整个内容。

如何在Azure数据工厂复制管道查询中将pipeline（）。parameters.windowStart转换为纪元

2 个答案: