我希望联接两个表,但是仅通过左表出现次数来获得联接列的平均值
文档:
+-----+-----+-------+
| dId | name| score |
+-----+-----+-------+
| A | n1 | 100 |
| B | n1 | 70 |
+-----+-----+-------+
实体:
+------+------------+-----+
| ename| details | dId |
+------+------------+-----+
| e1 | a | A |
| e2 | a | A |
| e3 | b | A |
| e4 | c | B |
+------+------------+-----+
预期输出:
+------+--------+---------------+
| name | average| entities |
+------+--------+---------------+
| n1 | 85 |e1, e2, e3, e4 |
+------+--------+---------------+
因为(100 + 70)/ 2 = 85
当前输出:
+------+--------+---------------+
| name | average| entities |
+------+--------+---------------+
| n1 | 92.5 |e1, e2, e3, e4 |
+------+--------+---------------+
因为(100 + 100 + 100 + 70)/ 4 = 92.5
当前查询:
SELECT
docT.name,
AVG(docT.score),
STRING_AGG(entityT.ename)
FROM
document_sentiment docT
JOIN
entity_sentiment entityT
ON
docT.dId = entityT.dId
GROUP BY
docT.cname
如何获得预期输出中的分数?
答案 0 :(得分:1)
尝试以下代码
select name, ename, avg(score) as score
from (SELECT
docT.name,
doct.Did,
MAX(docT.score) as score,
STRING_AGG(entityT.ename) as ename
FROM
document_sentiment docT
JOIN
entity_sentiment entityT
ON
docT.dId = entityT.dId
GROUP BY
docT.cname, doct.Did
) sub
group by name, ename
答案 1 :(得分:1)
尝试一下
select t.name, av,
GROUP_CONCAT(DISTINCT entityT.name ORDER BY entityT.name SEPARATOR ', ') AS entities
from (
SELECT docT.dId, docT.name,
AVG(docT.score) av
FROM document_sentiment docT
GROUP BY docT.name) T
JOIN entity_sentiment entityT ON T.dId = entityT.dId
GROUP BY T.name
答案 2 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
SELECT
docT.name,
AVG(docT.score) average,
STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
SELECT dId, STRING_AGG(ename) ename
FROM `project.dataset.entity_sentiment`
GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name
您可以使用问题中的示例数据来测试,玩游戏,如下例所示
#standardSQL
WITH `project.dataset.document_sentiment` AS (
SELECT 'A' dId, 'n1' name, 100 score UNION ALL
SELECT 'B', 'n1', 70
), `project.dataset.entity_sentiment` AS (
SELECT 'e1' ename, 'a' details, 'A' dId UNION ALL
SELECT 'e2', 'a', 'A' UNION ALL
SELECT 'e3', 'b', 'A' UNION ALL
SELECT 'e4', 'c', 'B'
)
SELECT
docT.name,
AVG(docT.score) average,
STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
SELECT dId, STRING_AGG(ename) ename
FROM `project.dataset.entity_sentiment`
GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name
Row name average entities
1 n1 85.0 e1,e2,e3,e4
答案 3 :(得分:0)
这很棘手。我认为窗口函数可能是最简单的解决方案:
SELECT docT.name, docT.avg_score,
STRING_AGG(entityT.ename)
FROM (SELECT docT.*,
AVG(docT.score) OVER (PARTITION BY docT.cname) as avg_score
FROM document_sentiment docT
) docT JOIN
entity_sentiment entityT
ON docT.dId = entityT.dId
GROUP BY docT.cname, docT.avg_score;
为什么这很棘手?好吧,如果您按cname
进行汇总,则您将损失dId
而无法进行JOIN
。
预聚合无法解决问题。幸运的是,这可以通过使用窗口函数来解决。