Question

我正在尝试加入两个表：

songs
id | song | artist
---|------|-------
1  | foo  | bar
2  | fuu  | bor
3  | fyy  | bir

score
id | score
---|------
1  | 2
2  | 4
3  | 8
2  | 6
3  | 2

使用此SQL命令：

SELECT songs.id, songs.song, songs.artist, score.score FROM songs LEFT JOIN score ON score.id=songs.id ORDER BY songs.id, score DESC

我得到的是同一首歌的重复，有多个分数，我希望得分平均。

result
id | song | artist | score
---|------|--------|-------
1  | foo  | bar    | 2
2  | fuu  | bor    | 4
2  | fuu  | bor    | 6
3  | fyy  | bir    | 8
3  | fyy  | bir    | 2

我尝试使用：

SELECT songs.id, songs.song, songs.artist, ROUND(AVG(score.score),1) AS 'score' FROM songs INNER JOIN score ON score.id=songs.id ORDER BY score DESC

但这可以平均所有分数，而不仅仅是每首歌曲的分数

result
id | song | artist | score
---|------|--------|-------
1  | foo  | bar    | 4.4

Answer 1

您需要GROUP BY所有要保留的字段：

SELECT songs.id, songs.song, songs.artist, 
    AVG(score.score * 1.0) AS AvgScore
FROM songs 
    LEFT JOIN score 
        ON score.id=songs.id 
GROUP BY songs.id, songs.song, songs.artist
ORDER BY songs.id, score DESC

或者，您可以这样做：

SELECT songs.id, songs.song, songs.artist, 
    (SELECT AVG(Score) FROM score WHERE score.id = songs.id) AS AvgScore)
FROM songs

Answer 2

使用＆＃34; 分组＆＃34; songs.id

    SELECT songs.id, songs.song, songs.artist, 
   ROUND(AVG(score.score),1) AS 'score' FROM songs 
    INNER JOIN score ON score.id=songs.id  
group by songs.id ORDER BY score DESC

Answer 3

使用它选择a.id，a.song，a.artist，avg（b.score）作为a.id = b.id上内部联接分数b的歌曲得分由a.id，a.artist，a.song

分组

Answer 4

您需要df.coalesce(1). write. format("com.databricks.spark.csv"). option("header", "true"). save("example") }并将数据加入左侧（GROUP BY）

试试这个：

JOIN LEFT

SQL用AVG加入两个表

4 个答案: