如何通过ID以适合CSV导出的格式对行进行分组?

时间:2019-04-17 04:31:03

标签: google-bigquery

我有下表:

Row ID          AltID1      Latitude    Longitude   AltID2  
1   16055000700 292367877   47.724477   -116.826249 83815818845
2   16055000700 292367882   47.724906   -116.827074 83815819235 
3   16055000700 292409477   47.720201   -116.804307 83815834156 
...
396 16055000800 292413726   47.69276    -116.810874 83814559302
397 16055000800 292413725   47.692863   -116.811014 83814559312 
398 16055000800 292414050   47.693109   -116.811462 83814559728 

例如一个具有多个具有相同ID的多行组的表。需要弄清楚如何按ID分组并获取与ID关联的AltID1,纬度,经度,AltID2。应该将其导出为CSV,并且需要对其进行设计以便于处理。

最终结果应如下所示:

line 1:
ID          Count   Data
16055000700 3       "[[292367877, 47.724477, -116.826249, 83815818845] ,[292367882, 47.724906, -116.827074, 83815819235], [292409477,47.720201,-116.804307,83815834156]]"
Line2:
...

第一列是ID,第二列是原始表中与此ID关联的行数,第三列是原始表中AltID1,Latitude,Longitude,AltID2列各具有3个值的数组的数组。

使用此代码获得一些帮助:

WITH
  data AS(
  SELECT
    *
  FROM
    UNNEST( ARRAY<STRUCT<id int64, altid1 int64, lat float64, lon float64, altid2 int64>> 
        [(16055000700,
        292367877,
        47.724477,
        -116.826249,
        83815818845), (16055000700,
        292367882,
        47.724906,
        -116.827074,
        83815819235), (16055000800,
        292414050,
        47.693109,
        -116.811462,
        83814559728)]
))
SELECT
  id,
  CONCAT('[', STRING_AGG(to_json_STRING(ARRAY<float64>[altid1,
        lat,
        lon,
        altid2])), ']')
FROM
  data d
GROUP BY
  id

如果我有一个表MyTable 使用架构:

FieldName   Type    Mode    
ID          INTEGER NULLABLE    
altid1      INTEGER NULLABLE    
lat         FLOAT   NULLABLE    
lon         FLOAT   NULLABLE    
altid2      INTEGER NULLABLE    

如何使用SELECT语句生成此部分,以从MyTable中获取数据?

           [(16055000700,
            292367877,
            47.724477,
            -116.826249,
            83815818845), (16055000700,
            292367882,
            47.724906,
            -116.827074,
            83815819235), (16055000800,
            292414050,
            47.693109,
            -116.811462,
            83814559728)]

2 个答案:

答案 0 :(得分:1)

您可以使用TO_JSON_STRING()来获得接近所需结果的结果。然后将这些字符串汇总成一个更大的字符串:

WITH data AS (
  SELECT *
  FROM `bigquery-public-data.noaa_gsod.gsod2017`
  WHERE stn IN ('998258','995011','996080') AND mo="02" AND da<'03'
)

SELECT stn, FORMAT('[%s]', STRING_AGG(values)) values
FROM (
  SELECT stn, TO_JSON_STRING([min,max,temp]) values
  FROM `data`
)
GROUP BY 1

enter image description here

答案 1 :(得分:1)

以下是用于BigQuery标准SQL

    
#standardSQL
SELECT ID, COUNT(1) rows_count, 
  CONCAT('[', STRING_AGG(TO_JSON_STRING([AltID1, Latitude, Longitude, AltID2])), ']') data
FROM `project.dataset.table`
GROUP BY ID   

您可以使用问题中的示例数据来进行测试,如上示例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 16055000700 ID, 292367877 AltID1, 47.724477 Latitude, -116.826249 Longitude, 83815818845 AltID2 UNION ALL
  SELECT 16055000700, 292367882, 47.724906, -116.827074, 83815819235 UNION ALL 
  SELECT 16055000700, 292409477, 47.720201, -116.804307, 83815834156 UNION ALL 
  SELECT 16055000800, 292413726, 47.69276, -116.810874, 83814559302 UNION ALL
  SELECT 16055000800, 292413725, 47.692863, -116.811014, 83814559312 UNION ALL 
  SELECT 16055000800, 292414050, 47.693109, -116.811462, 83814559728 
)
SELECT ID, COUNT(1) rows_count, 
  CONCAT('[', STRING_AGG(TO_JSON_STRING([AltID1, Latitude, Longitude, AltID2])), ']') data
FROM `project.dataset.table`
GROUP BY ID   

有结果

Row ID          rows_count  data     
1   16055000700 3           [[292367877,47.724477,-116.826249,83815818845],[292367882,47.724906,-116.827074,83815819235],[292409477,47.720201,-116.804307,83815834156]]  
2   16055000800 3           [[292413726,47.69276,-116.810874,83814559302],[292413725,47.692863,-116.811014,83814559312],[292414050,47.693109,-116.811462,83814559728]]