我目前正在处理BigQuery中的数据,然后导出到Excel中以执行最终的Pivot表,并且希望能够在BigQuery中使用PIVOT选项创建相同的内容。
我在大查询中设置的数据类似于
Transaction_Month || ConsumerId || CUST_createdMonth
01/01/2015 || 1 || 01/01/2015
01/01/2015 || 1 || 01/01/2015
01/02/2015 || 1 || 01/01/2015
01/01/2015 || 2 || 01/01/2015
01/02/2015 || 3 || 01/02/2015
01/02/2015 || 4 || 01/02/2015
01/02/2015 || 5 || 01/02/2015
01/03/2015 || 5 || 01/02/2015
01/03/2015 || 6 || 01/03/2015
01/04/2015 || 6 || 01/03/2015
01/06/2015 || 6 || 01/03/2015
01/03/2015 || 7 || 01/03/2015
01/04/2015 || 8 || 01/04/2015
01/05/2015 || 8 || 01/04/2015
01/04/2015 || 9 || 01/04/2015
它本质上是一个附有客户信息的订单表。
当我将这些数据放入excel时,我将其添加到数据透视表中,我将CUST_createdMonth添加为Row,将Transaction_Month添加为列,并且值是ConsumerID的不同计数
这种枢轴在BigQuery中是否可行?
答案 0 :(得分:4)
在BigQuery中没有很好的方法可以做到这一点,但你可以按照以下想法进行操作
第1步
运行以下查询
SELECT 'SELECT CUST_createdMonth, ' +
GROUP_CONCAT_UNQUOTED(
'EXACT_COUNT_DISTINCT(IF(Transaction_Month = "' + Transaction_Month + '", ConsumerId, NULL)) as [m_' + REPLACE(Transaction_Month, '/', '_') + ']'
)
+ ' FROM yourTable GROUP BY CUST_createdMonth ORDER BY CUST_createdMonth'
FROM (
SELECT Transaction_Month
FROM yourTable
GROUP BY Transaction_Month
ORDER BY Transaction_Month
)
结果 - 你会得到如下的字符串(为了便于阅读,它的格式如下)
SELECT
CUST_createdMonth,
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/01/2015", ConsumerId, NULL)) AS [m_01_01_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/02/2015", ConsumerId, NULL)) AS [m_01_02_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/03/2015", ConsumerId, NULL)) AS [m_01_03_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/04/2015", ConsumerId, NULL)) AS [m_01_04_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/05/2015", ConsumerId, NULL)) AS [m_01_05_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/06/2015", ConsumerId, NULL)) AS [m_01_06_2015]
FROM yourTable
GROUP BY
CUST_createdMonth
ORDER BY
CUST_createdMonth
第2步
只需运行上面的撰写查询
结果将低于
CUST_createdMonth m_01_01_2015 m_01_02_2015 m_01_03_2015 m_01_04_2015 m_01_05_2015 m_01_06_2015
01/01/2015 2 1 0 0 0 0
01/02/2015 0 3 1 0 0 0
01/03/2015 0 0 2 1 0 1
01/04/2015 0 0 0 2 1 0
请注意
如果您有多个月的时间来进行过多的手动操作,那么步骤1会很有帮助 在这种情况下 - 步骤1可帮助您生成查询
您可以在我的其他帖子中看到有关旋转的更多信息。
How to scale Pivoting in BigQuery?
请注意 - 每张桌子有10K列的限制 - 所以你受限于10K组织
您还可以在下面看到简化示例(如果上面的一个太复杂/冗长):
How to transpose rows to columns with large amount of the data in BigQuery/SQL?
How to create dummy variable columns for thousands of categories in Google BigQuery?
Pivot Repeated fields in BigQuery
答案 1 :(得分:1)
实际上,Mikhail还有另一种方法可以将EAV类型模式的行转换为列,方法是使用日志记录表并查询最后一个CREATE TABLE条目以确定最新的表模式。
CREATE TEMP FUNCTION jsonSchemaStringToArray(jsonSchema String)
RETURNS ARRAY<STRING> AS ((
SELECT
SPLIT(
REGEXP_REPLACE(REPLACE(LTRIM(jsonSchema,'{ '),'"fields": [',''), r'{[^{]+"name": "([^\"]+)"[^}]+}[, ]*', '\\1,')
,',')
));
WITH valid_schema_columns AS (
WITH array_output aS (SELECT
jsonSchemaStringToArray(jsonSchema) AS column_names
FROM (
SELECT
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.schemaJson AS jsonSchema
, ROW_NUMBER() OVER (ORDER BY metadata.timestamp DESC) AS record_count
FROM `realself-main.bigquery_logging.cloudaudit_googleapis_com_data_access_20170101`
WHERE
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.tableId = '<table_name>'
AND
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.datasetId = '<schema_name>'
AND
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.createDisposition = 'CREATE_IF_NEEDED'
) AS t
WHERE
t.record_count = 1 -- grab the latest entry
)
-- this is actually what UNNESTS the array into standard rows
SELECT
valid_column_name
FROM array_output
LEFT JOIN UNNEST(column_names) AS valid_column_name
)