BigQuery / Shopify订单数据查询

时间:2018-11-15 22:12:59

标签: google-bigquery shopify standard-sql

如果自上次导入以来发生了更改,则我从Shopify导入的订单会在BigQuery中为每个订单创建一个新条目,这样您就可以查看订单属性随时间的变化,而不仅仅是上次导入状态。这还会在表中以相同顺序创建多个条目,其中唯一的唯一部分是_sdc_batched_atsdc_sequence值。我有时看到多达30个相同顺序的条目。

表架构...

order:
  order_number: Int
  fulfillments: Array
  _sdc_batched_at: DateTime
  _sdc_sequence: Int

我做了什么...

我创建了一个分区表,从本质上讲,它可以归结为给定日期范围和满足> 0的条目的子集

初始查询以减少数据集...

with orders as (
    select order_number, fulfillments, _sdc_batched_at, _sdc_sequence
    from `project.shopify.orders`
    where created_at between '2018-11-08' and '2018-11-15'
    and ARRAY_LENGTH(fulfillments) > 0
)

问题... 由于实现是一个数组,因此我遇到了尝试使用distinct或group by的问题。如何编写仅返回_sdc_batched_at值的最新订单条目的查询?

样本数据

    [
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 02:46:21.270 UTC",
        "_sdc_sequence": "1541817507934"
    },
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 03:16:16.606 UTC",
        "_sdc_sequence": "1541819139795"
    },
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
        "_sdc_sequence": "1541821046476"
    },
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
        "_sdc_sequence": "1541822755508"
    },
    {
        "order_number": "2212",
        "fulfillments": [
            {
                "tracking_url": null,
                "id": "617029074993",
                "tracking_company": "ups",
                "tracking_number": "Z1234567890"
            }
        ],
        "_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
        "_sdc_sequence": "1541821046476"
    },
    {
        "order_number": "2212",
        "fulfillments": [
            {
                "tracking_url": null,
                "id": "617029074993",
                "tracking_company": "ups",
                "tracking_number": "Z1234567890"
            }
        ],
        "_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
        "_sdc_sequence": "1541822755508"
    }
    ]

预期结果

仅按_sdc_batched_at值返回最新条目

{
    "order_number": "5545",
    "fulfillments": [
    {
        "tracking_url": null,
        "id": "617029074993",
        "tracking_company": "ups",
        "tracking_number": "Z1234567890"
    }
    ],
    "_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
    "_sdc_sequence": "1541822755508"
},
{
    "order_number": "2212",
    "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
    ],
    "_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
    "_sdc_sequence": "1541822755508"
}

1 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

SELECT AS VALUE ARRAY_AGG(t ORDER BY _sdc_batched_at DESC LIMIT 1)[OFFSET(0)] 
FROM `project.shopify.orders` t
GROUP BY order_number   

显然,您可以添加WHERE子句所需的所有内容