BigQuery中对角线元素的总和

时间:2018-12-05 21:06:19

标签: google-bigquery

我在BigQuery中有一个混淆矩阵,我想在其中找到对角线元素的总和。我该如何使用SQL

这是数据

select 0 as predictedGroup , 60 as label0 , 20 as label1, 20 as label2
union all
select 1, 20 , 60 , 20
union all
select 2, 20 , 20 , 60

012是我的测试数据中的Y标签。通常有N个标签。

对于上述数据,我应该得到180作为输出(60 + 60 + 60)

2 个答案:

答案 0 :(得分:1)

如果您绑定到架构/设计-意味着您有代表标签和数组的列-以下应该适合您

#standardSQL
SELECT SUM(CAST(REGEXP_EXTRACT(TO_JSON_STRING(t), 
    CONCAT(r'(?::\d*.*?){', CAST(predictedGroup + 1 AS STRING), r'}:(\d*)')) AS INT64)
  ) AS sum_of_diagonal
FROM `project.dataset.sampleTable` t  

您可以使用以下问题中的示例数据进行测试,操作

#standardSQL
WITH `project.dataset.sampleTable` AS (
  SELECT 0 AS predictedGroup , 60 AS label0 , 20 AS label1, 20 AS label2 UNION ALL
  SELECT 1, 20 , 60 , 20 UNION ALL
  SELECT 2, 20 , 20 , 60
)
SELECT SUM(CAST(REGEXP_EXTRACT(TO_JSON_STRING(t), 
    CONCAT(r'(?::\d*.*?){', CAST(predictedGroup + 1 AS STRING), r'}:(\d*)')) AS INT64)
  ) AS sum_of_diagonal
FROM `project.dataset.sampleTable` t  

有结果

Row sum_of_diagonal  
1   180  

答案 1 :(得分:0)

在这种情况下,您可能不希望标签值使用离散列,而是使用值的数组,因为您提到可能有N个标签。

如果您采用这种方法进行数据表示,则可能会这样琐碎:

WITH sampleTable as 
(
select 0 as predictedGroup, [60, 20, 20] as labelVals
union all
select 1, [20 , 60 , 20]
union all
select 2, [20 , 20 , 60]
)

SELECT SUM(labelVals[SAFE_OFFSET(predictedGroup)]) FROM sampleTable