我在BigQuery中有一个混淆矩阵,我想在其中找到对角线元素的总和。我该如何使用SQL
这是数据
select 0 as predictedGroup , 60 as label0 , 20 as label1, 20 as label2
union all
select 1, 20 , 60 , 20
union all
select 2, 20 , 20 , 60
0
,1
和2
是我的测试数据中的Y标签。通常有N个标签。
对于上述数据,我应该得到180
作为输出(60 + 60 + 60)
答案 0 :(得分:1)
如果您绑定到架构/设计-意味着您有代表标签和数组的列-以下应该适合您
#standardSQL
SELECT SUM(CAST(REGEXP_EXTRACT(TO_JSON_STRING(t),
CONCAT(r'(?::\d*.*?){', CAST(predictedGroup + 1 AS STRING), r'}:(\d*)')) AS INT64)
) AS sum_of_diagonal
FROM `project.dataset.sampleTable` t
您可以使用以下问题中的示例数据进行测试,操作
#standardSQL
WITH `project.dataset.sampleTable` AS (
SELECT 0 AS predictedGroup , 60 AS label0 , 20 AS label1, 20 AS label2 UNION ALL
SELECT 1, 20 , 60 , 20 UNION ALL
SELECT 2, 20 , 20 , 60
)
SELECT SUM(CAST(REGEXP_EXTRACT(TO_JSON_STRING(t),
CONCAT(r'(?::\d*.*?){', CAST(predictedGroup + 1 AS STRING), r'}:(\d*)')) AS INT64)
) AS sum_of_diagonal
FROM `project.dataset.sampleTable` t
有结果
Row sum_of_diagonal
1 180
答案 1 :(得分:0)
在这种情况下,您可能不希望标签值使用离散列,而是使用值的数组,因为您提到可能有N个标签。
如果您采用这种方法进行数据表示,则可能会这样琐碎:
WITH sampleTable as
(
select 0 as predictedGroup, [60, 20, 20] as labelVals
union all
select 1, [20 , 60 , 20]
union all
select 2, [20 , 20 , 60]
)
SELECT SUM(labelVals[SAFE_OFFSET(predictedGroup)]) FROM sampleTable