如何从BigQuery中的字符串中提取值并分配位置编号?

时间:2017-04-21 05:55:57

标签: google-bigquery

我的数据目前看起来像这样:
Initial data

我想要的输出是这样的:
New data format

期望的成就是:

  • 在OrderDescription中提取.csv字符串中的值。
  • 将它们呈现在规范化表格中,并分配和OrderDescriptionPosition,它是原始数据中.csv字符串中显示的顺序。

我通过将split函数与row_number结合使用来完成此操作。这似乎在客户的基础上起作用,但似乎回归不可靠"洗牌" row_numbers在多个客户上运行时,即错误的订单。

select
   CustomerID,
   OrderID,
   OrderDescriptionItem,            
   row_number() over(partition by CustomerID, OrderID) as OrderDescriptionPosition
from
(
    select
       CustomerID,
       OrderID,          
       split(OrderDescription, ',') as OrderDescriptionItem
    from
       InitialTable
) as e
,unnest(OrderDescriptionItem)   as OrderDescriptionItem

有没有人有更强大的解决方案?任何使用UDF和javascript的建议都欢迎。

2 个答案:

答案 0 :(得分:1)

您可以将WITH OFFSETUNNEST结合使用以获取排名。这是一个例子:

#standardSQL
WITH Input AS (
  SELECT 1 AS CustomerID, 1001 AS OrderID, '12,14,16,22,28' AS OrderDescription UNION ALL
  SELECT 2 AS CustomerID, 1002 AS OrderID, '1,5' AS OrderDescription UNION ALL
  SELECT 3 AS CustomerID, 1003 AS OrderID, '44,55,66' AS OrderDescription
)
SELECT
  CustomerID,
  OrderID,
  OrderDescription,
  off + 1 AS OrderDescriptionPosition
FROM Input
CROSS JOIN UNNEST(SPLIT(OrderDescription)) AS OrderDescription
  WITH OFFSET off;
+------------+---------+------------------+--------------------------+
| CustomerID | OrderID | OrderDescription | OrderDescriptionPosition |
+------------+---------+------------------+--------------------------+
| 1          | 1001    | 12               | 1                        |
| 1          | 1001    | 14               | 2                        |
| 1          | 1001    | 16               | 3                        |
| 1          | 1001    | 22               | 4                        |
| 1          | 1001    | 28               | 5                        |
| 2          | 1002    | 1                | 1                        |
| 2          | 1002    | 5                | 2                        |
| 3          | 1003    | 44               | 1                        |
| 3          | 1003    | 55               | 2                        |
| 3          | 1003    | 66               | 3                        |
+------------+---------+------------------+--------------------------+

答案 1 :(得分:0)

如果您的示例代表您的真实用例(在OrderDescription的意义上是一个有序的值列表) - 您可以使用您的查询版本 - 只需在OVER()中添加ORDER BY如下

  
#standardSQL
WITH InitialTable AS (
  SELECT 1 AS CustomerID, 1001 AS OrderID, '12,14,16,22,28' AS OrderDescription UNION ALL
  SELECT 2, 1002, '1,5' UNION ALL
  SELECT 3, 1003, '44,55,66'
)
SELECT
  CustomerID,
  OrderID,
  OrderDescription,
  ROW_NUMBER() OVER(PARTITION BY CustomerID, OrderID ORDER BY OrderDescription) AS OrderDescriptionPosition
FROM InitialTable, UNNEST(SPLIT(OrderDescription)) AS OrderDescription
-- ORDER BY CustomerID, OrderID, OrderDescriptionPosition