我正在尝试加入两个表:
songs
id | song | artist
---|------|-------
1 | foo | bar
2 | fuu | bor
3 | fyy | bir
score
id | score
---|------
1 | 2
2 | 4
3 | 8
2 | 6
3 | 2
使用此SQL命令:
SELECT songs.id, songs.song, songs.artist, score.score FROM songs LEFT JOIN score ON score.id=songs.id ORDER BY songs.id, score DESC
我得到的是同一首歌的重复,有多个分数,我希望得分平均。
result
id | song | artist | score
---|------|--------|-------
1 | foo | bar | 2
2 | fuu | bor | 4
2 | fuu | bor | 6
3 | fyy | bir | 8
3 | fyy | bir | 2
我尝试使用:
SELECT songs.id, songs.song, songs.artist, ROUND(AVG(score.score),1) AS 'score' FROM songs INNER JOIN score ON score.id=songs.id ORDER BY score DESC
但这可以平均所有分数,而不仅仅是每首歌曲的分数
result
id | song | artist | score
---|------|--------|-------
1 | foo | bar | 4.4
答案 0 :(得分:4)
您需要GROUP BY所有要保留的字段:
SELECT songs.id, songs.song, songs.artist,
AVG(score.score * 1.0) AS AvgScore
FROM songs
LEFT JOIN score
ON score.id=songs.id
GROUP BY songs.id, songs.song, songs.artist
ORDER BY songs.id, score DESC
或者,您可以这样做:
SELECT songs.id, songs.song, songs.artist,
(SELECT AVG(Score) FROM score WHERE score.id = songs.id) AS AvgScore)
FROM songs
答案 1 :(得分:0)
使用" 分组" songs.id
SELECT songs.id, songs.song, songs.artist,
ROUND(AVG(score.score),1) AS 'score' FROM songs
INNER JOIN score ON score.id=songs.id
group by songs.id ORDER BY score DESC
答案 2 :(得分:0)
使用它 选择a.id,a.song,a.artist,avg(b.score)作为a.id = b.id上内部联接分数b的歌曲得分 由a.id,a.artist,a.song
分组答案 3 :(得分:0)
您需要df.coalesce(1).
write.
format("com.databricks.spark.csv").
option("header", "true").
save("example")
}
并将数据加入左侧(GROUP BY
)
试试这个:
JOIN LEFT