问题描述
我正在尝试为每个推荐获取以逗号分隔的平均成绩列表,其中包含另一个以逗号分隔的推荐内容ID列表。推荐是一个对象,其中包含将接收建议的内容(ContentID
)以及将推荐的其他内容列表(RecommendedContentIDs
)。
表格结构,样本数据和其他限制
<击> 我有一个两表数据库结构。第一个表包含保存为逗号分隔排名列表的推荐内容ID。第二个表包含每个推荐内容ID的等级。排名列表最多包含10个逗号分隔值,等级范围从0到5。 击>
为了更好地说明问题,这里有表结构和一些示例数据:
Table Recommendations
|ID |ContentID |RecommendedContentIDs |Type |
+------+-------------+----------------------+-----+
|1 |2051 |9706,14801,13354,... |a |
+------+-------------+----------------------+-----+
|67 |2051 |8103,16366,8795,... |b |
+------+-------------+----------------------+-----+
|133 |2051 |8795,8070,15341,... |c |
+------+-------------+----------------------+-----+
|22 |1234 |4782,283,33,... |a |
+------+-------------+----------------------+-----+
...
Table Grades
|ID |RecommendationID |RecommendedDocumentID |Grade |EvaluatorHash|
+------+-----------------+----------------------+------+-------------+
|1 |1 |9706 |4 |123456789 |
+------+-----------------+----------------------+------+-------------+
|2 |1 |14801 |5 |123456789 |
+------+-----------------+----------------------+------+-------------+
|3 |1 |13354 |3 |987654321 |
+------+-----------------+----------------------+------+-------------+
|3 |1 |9706 |3 |987654321 |
+------+-----------------+----------------------+------+-------------+
|4 |67 |8103 |5 |123456789 |
+------+-----------------+----------------------+------+-------------+
|1 |67 |16366 |4 |987654321 |
+------+-----------------+----------------------+------+-------------+
|1 |133 |8795 |2 |123456789 |
+------+-----------------+----------------------+------+-------------+
...
我已将表建议书中的RecommendedContentIDs列转换为一个单独的表格,如下所示:
Table RecommendedContent
|ID |RecommendationID |RecommendedContentID |Rank |
+------+-----------------+---------------------+-----+
|1 |1 |9706 |1 |
+------+-----------------+---------------------+-----+
|2 |1 |14801 |2 |
+------+-----------------+---------------------+-----+
|3 |1 |13354 |3 |
+------+-----------------+---------------------+-----+
|4 |1 |12787 |4 |
+------+-----------------+---------------------+-----+
...
+------+-----------------+---------------------+-----+
|11 |2 |19042 |1 |
+------+-----------------+---------------------+-----+
|12 |2 |13376 |2 |
+------+-----------------+---------------------+-----+
|13 |2 |9853 |3 |
+------+-----------------+---------------------+-----+
预期结果
我现在想做一个返回结果集的查询,该结果集包含两个逗号分隔的列表,这些列表是对应的,这样我就可以显示每个推荐内容ID的平均等级。看起来应该是这样的:
|ContentID |RecommendedContentIDs |RecommendedContentAverageGrades |Type |
+-------------+-------------------------+----------------------------------+------+
|2051 |9706,14801,13354,... |3.5,5.0,3.0,... |a |
+-------------+-------------------------+----------------------------------+------+
|2051 |8103,16366,8795,... |5.0,4.0,0.0,... |b |
+-------------+-------------------------+----------------------------------+------+
|2051 |8795,8070,15341,... |2.0,0.0,0.0,... |c |
+-------------+-------------------------+----------------------------------+------+
...
如您所见,RecommendedContentAverageGrades
列包含RecommendedContentIDs
列中每个对应ContentID的平均等级(ID为9706的内容被评分两次,一次为4因此,一旦有3,平均值为3.5)。如果内容未评分,则平均成绩应为0.这里真正重要的是两个以逗号分隔的列表是通讯员,因为RecommendedContentIDs
中的列表是排名列表。
我通常会在C#中实现这样的东西,但我想知道是否可以用SQL来完成。我在考虑使用GROUP_CONCAT
,但我无法获得正确的结果集。如果有人为MySQL和/或T-SQL提供有效的SQL查询,我将非常感激,但只是建议也会很好。
编辑
#1 - 劳伦斯提到使用单独的表而不是逗号分隔的列表。我使用它们是因为旧设计,我无法改变。但是,我愿意接受这样的答案,即假设逗号分隔列表中的数据存储在单独的表中。
#2 - 改变了Laurence建议的结构(使用分隔表 - 请参阅更新的结构)。
答案 0 :(得分:3)
这只是跟随@Laurence给出的答案:
答案 1 :(得分:2)
更新了Akrigg的修复和sql小提琴,以及如何按推荐表中的值排序 还根据brozo的修复使用group_concat子句中的order by进行了更新:
Table RecommendedContent
+-----------------+----------------------+
|RecommendationID | RecommendedContentID |
+-----------------+----------------------+
| 1 | 9706 |
| 1 | 14801 |
| 1 | 13354 |
| 67 | 8103 |
| ... | ... |
+-----------------+----------------------+
Select
a.RecommendationID,
a.ContentID,
Group_Concat(a.RecommendedContentId Order By a.Rank),
Group_Concat(Trim(Trailing '.' From Trim(Trailing '0' From a.AverageGrade)) Order By a.Rank),
a.Type
From (
Select
r.RecommendationID,
r.ContentID,
r.Type,
rc.RecommendedContentID,
rc.Rank,
Coalesce(Avg(g.Grade), 0) As AverageGrade
From
Recommendations r
Left Outer Join
RecommendedContent rc
On r.RecommendationID = rc.RecommendationID
Left Outer Join
Grades g
On rc.RecommendedContentID = g.RecommendedDocumentID And
rc.RecommendationID = g.RecommendationID
Group By
r.RecommendationID,
r.ContentID,
r.Type,
rc.RecommendedContentID,
rc.Rank
) as a
Group By
a.RecommendationID,
a.ContentID,
a.Type
Order By
a.ContentID, -- Or other way round if that's what you prefer
a.RecommendationID
答案 2 :(得分:1)
您可以在SQL Server中create a custom aggreate进行逗号分隔的字符串连接,然后像这样使用它:
SELECT ContentID, RecommendedContentIDs, CustomToCsv(AvgGrade), Type FROM
(
SELECT ContentID, RecommendedContentIDs, AVG(Grade) AvgGrade, Type
FROM Recommendations r INNER JOIN Grades g ON r.ID = g.RecommendationID
GROUP BY ContentID, RecommendedContentIDs, RecommendedDocumentID, Type
) as t
GROUP BY ContentID, RecommendedContentIDs, Type
答案 3 :(得分:1)
这是在oracle中完成的
WITH count_number AS
(SELECT
ContentID,
','
||RecommendedContentIDs
||',' new_ContentIDs,
RecommendedContentIDs,
type ,
LENGTH(RECOMMENDEDCONTENTIDS )-LENGTH(REPLACE(RECOMMENDEDCONTENTIDS ,','))+1 COUNT_ID
FROM Recommendations
) ,
RecommendedContentIDs_postion AS
(SELECT A1.*,
B1.CONTENTIDS_OCCURANCE_POSITION ,
SUBSTR(new_ContentIDs,instr(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION)+1 , INSTR(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION+1)-instr(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION)-1) ContentIDs
FROM count_number a1,
(SELECT I ContentIDs_OCCURANCE_POSITION
FROM DUAL model dimension BY (1 i) measures (0 X) (X[FOR I
FROM 2 TO 1000 increment 1] = 0)
) b1
WHERE b1.ContentIDs_OCCURANCE_POSITION<=a1.count_id
)
SELECT
CONTENTID,
WM_CONCAT(CONTENTIDS) RECOMMENDEDCONTENTIDS ,
WM_CONCAT(GRADE) avg_grade_contentid ,
type
FROM RECOMMENDEDCONTENTIDS_POSTION RCI,
(SELECT RECOMMENDEDDOCUMENTID,
AVG(GRADE) GRADE
FROM Grades
GROUP BY RECOMMENDEDDOCUMENTID
) GRD
WHERE TRIM(RCI.CONTENTIDS)=TRIM(GRD.RECOMMENDEDDOCUMENTID)
GROUP BY
ContentID,
type;