SELECT
STRFTIME_UTC_USEC(TimeStamp,"%Y-%m-%d %H:%M:%S") AS TimeStamp,
Value.provided,
__key__.app AS ProjectID,
REGEXP_EXTRACT(__key__.path, r'"hostname"[, ]*"(.*?)"') AS hostname,
REGEXP_EXTRACT(__key__.path, r'"machine"[, ]*"(.*?)"') AS machine,
REGEXP_EXTRACT(__key__.path, r'"variable"[, ]*"(.*?)"') AS variable,
IF(value.provided = 'integer', CAST(value.integer AS STRING),
CAST(value.boolean AS STRING)) AS value
FROM
[spark-test-project-152415:spark_machine_learning.spark_12272016]
ORDER BY
TimeStamp
LIMIT 100000
以上查询提取数据集,如附图所示。我需要将变量列拆分为多个列,其中包含值。我认为必须使用子查询来完成。我怎样才能开始这个?
预期产出:
使用PIVOT查询
SELECT
*
FROM (SELECT
#Timestamp,
STRFTIME_UTC_USEC(TimeStamp,"%Y-%m-%d %H:%M:%S") AS [TimeStamp],
Value.provided,
__key__.app AS ProjectID,
REGEXP_EXTRACT(__key__.path, r'"hostname"[, ]*"(.*?)"') AS [hostname],
REGEXP_EXTRACT(__key__.path, r'"machine"[, ]*"(.*?)"') AS [machine],
REGEXP_EXTRACT(__key__.path, r'"variable"[, ]*"(.*?)"') AS [variable],
IF(value.provided = 'integer', CAST(value.integer AS STRING), CAST(value.boolean AS STRING)) AS [value]
FROM
[spark-test-project-152415:spark_machine_learning.spark_12272016]
ORDER BY
TimeStamp ) AS SourceTable PIVOT ([value] FOR [variable] IN ([Counter_Strokes_No_Reset],
[Press_State_Code],
[Press_Operator_1],
[Press_Stop_Time_Limit],
[Counter_Good_Parts_No_Reset],
[Press_Error_Reason_Code],
[Counter_Scrap_No_Reset],
[Production_Tool_Number],
[Press_Stop_Time_Actual],
[Production_Good_Parts_Preset],
[Press_Shaft_Speed],
[Production_Part_Number],
[Press_Total_Tonnage],
[Production_Job_Number]) ) AS PivotTable
答案 0 :(得分:1)
我该如何开始?
尝试下面,它可能会给你一个想法
render()