在Big Query中汇总/合并/组合反向对

时间:2017-03-01 00:28:35

标签: sql google-bigquery

我有一个类似于这样的表,其中对的关系计数通常是相反的顺序。

country1    country2    count
 CHN         KOR         65
 TWN         KOR         32
 KOR         CHN         43

我有CHN - KOR和KOR - CHN。如果我已经确定这些是不同的计数,那么这些只代表两种描述关系的方式,我想总结一对的计数,所以最终的结果是

country1    country2    count
 CHN         KOR         108
 TWN         KOR          32

我正在使用Big Query。有谁知道在SQL中整合反向对的方法?注意:这些不是重复,因此这不是删除重复项的问题,而是组合反向对

2 个答案:

答案 0 :(得分:3)

另一个选项,展示BigQuery Standard SQL的强大功能和酷感

#standardSQL
WITH pairs AS (
  SELECT 
    (SELECT STRING_AGG(country ORDER BY country) 
      FROM UNNEST(ARRAY[country1, country2]) AS country
    ) AS countries,
    SUM(COUNT) AS COUNT
  FROM yourTable 
  GROUP BY countries
)
SELECT 
  REGEXP_EXTRACT(countries, r'(\w+),') AS country1,
  REGEXP_EXTRACT(countries, r',(\w+)') AS country2,
  COUNT
FROM pairs  

当您有两个以上“错误排序”的字段

时,此版本可能会更加优化

您可以使用以下虚拟数据进行简要测试

#standardSQL
WITH yourTable AS (
SELECT 'CHN' AS country1, 'KOR' AS country2, 65 AS COUNT UNION ALL
SELECT 'TWN', 'KOR', 32 UNION ALL
SELECT 'KOR', 'CHN', 43 
)  

下面是两个以上字段混洗时的快速示例

#standardSQL
WITH yourTable AS (
SELECT 'CHN' AS country1, 'KOR' AS country2, 'US' as country3, 65 AS COUNT UNION ALL
SELECT 'TWN', 'KOR', 'GB', 32 UNION ALL
SELECT 'KOR', 'US', 'CHN', 43 
),
pairs AS (
  SELECT 
    (SELECT STRING_AGG(country ORDER BY country) 
      FROM UNNEST(ARRAY[country1, country2, country3]) AS country
    ) AS countries,
    SUM(COUNT) AS COUNT
  FROM yourTable 
  GROUP BY countries
)
SELECT 
  REGEXP_EXTRACT(countries, r'(\w+),\w+,\w+') AS country1,
  REGEXP_EXTRACT(countries, r'\w+,(\w+),\w+') AS country2,
  REGEXP_EXTRACT(countries, r'\w+,\w+,(\w+)') AS country3,
  COUNT
FROM pairs

当然,可以进一步优化,但主要关注重组的逻辑,不需要多重比较/等等

  

添加

感谢@GordonLinoff坚持以下选项!我认为你是对的 - 在这里使用ARRAY_AGG更优雅

#standardSQL
WITH yourTable AS (
SELECT 'CHN' AS country1, 'KOR' AS country2, 'US' AS country3, 65 AS count UNION ALL
SELECT 'TWN', 'KOR', 'GB', 32 UNION ALL
SELECT 'KOR', 'US', 'CHN', 43 
),
pairs AS (
  SELECT 
    (SELECT ARRAY_AGG(country ORDER BY country) 
      FROM UNNEST(ARRAY[country1, country2, country3]) AS country
    ) AS countries,
    count
  FROM yourTable 
)
SELECT 
  countries[OFFSET(0)] AS country1,
  countries[OFFSET(1)] AS country2,
  countries[OFFSET(2)] AS country3,
  SUM(count) AS count
FROM pairs
GROUP BY 1, 2, 3

答案 1 :(得分:1)

这是一种方法:

var ShiftReportDate = Convert.ToDateTime(DR["ShiftReportDate"]);

这适用于旧版和标准版界面。对于标准,BigQuery在字符串上支持select country1, country2, sum(count) from ((select country1, country2, count from t where country1 <= country2 ) union all (select country2, country1, count from t where country1 > country2 ) ) cc group by country1, country2; greatest()

least()