在SQL中为排名ID创建“平均成绩列表”

时间:2012-11-16 11:20:11

标签: mysql sql tsql csv average

问题描述

我正在尝试为每个推荐获取以逗号分隔的平均成绩列表,其中包含另一个以逗号分隔的推荐内容ID列表。推荐是一个对象,其中包含将接收建议的内容(ContentID)以及将推荐的其他内容列表(RecommendedContentIDs)。

表格结构,样本数据和其他限制

<击> 我有一个两表数据库结构。第一个表包含保存为逗号分隔排名列表的推荐内容ID。第二个表包含每个推荐内容ID的等级。排名列表最多包含10个逗号分隔值,等级范围从0到5。

为了更好地说明问题,这里有表结构和一些示例数据:

Table Recommendations

|ID    |ContentID    |RecommendedContentIDs |Type |
+------+-------------+----------------------+-----+
|1     |2051         |9706,14801,13354,...  |a    |
+------+-------------+----------------------+-----+
|67    |2051         |8103,16366,8795,...   |b    |
+------+-------------+----------------------+-----+
|133   |2051         |8795,8070,15341,...   |c    |
+------+-------------+----------------------+-----+
|22    |1234         |4782,283,33,...       |a    |
+------+-------------+----------------------+-----+
...

Table Grades

|ID    |RecommendationID |RecommendedDocumentID |Grade |EvaluatorHash|
+------+-----------------+----------------------+------+-------------+
|1     |1                |9706                  |4     |123456789    |
+------+-----------------+----------------------+------+-------------+
|2     |1                |14801                 |5     |123456789    |
+------+-----------------+----------------------+------+-------------+
|3     |1                |13354                 |3     |987654321    |
+------+-----------------+----------------------+------+-------------+
|3     |1                |9706                  |3     |987654321    |
+------+-----------------+----------------------+------+-------------+
|4     |67               |8103                  |5     |123456789    |
+------+-----------------+----------------------+------+-------------+
|1     |67               |16366                 |4     |987654321    |
+------+-----------------+----------------------+------+-------------+
|1     |133              |8795                  |2     |123456789    |
+------+-----------------+----------------------+------+-------------+
...

我已将表建议书中的RecommendedContentIDs列转换为一个单独的表格,如下所示:

Table RecommendedContent

|ID    |RecommendationID |RecommendedContentID |Rank |
+------+-----------------+---------------------+-----+
|1     |1                |9706                 |1    |
+------+-----------------+---------------------+-----+
|2     |1                |14801                |2    |
+------+-----------------+---------------------+-----+
|3     |1                |13354                |3    |
+------+-----------------+---------------------+-----+
|4     |1                |12787                |4    |
+------+-----------------+---------------------+-----+
...

+------+-----------------+---------------------+-----+
|11    |2                |19042                |1    |
+------+-----------------+---------------------+-----+
|12    |2                |13376                |2    |
+------+-----------------+---------------------+-----+
|13    |2                |9853                 |3    |
+------+-----------------+---------------------+-----+

预期结果

我现在想做一个返回结果集的查询,该结果集包含两个逗号分隔的列表,这些列表是对应的,这样我就可以显示每个推荐内容ID的平均等级。看起来应该是这样的:

|ContentID    |RecommendedContentIDs    |RecommendedContentAverageGrades   |Type  |
+-------------+-------------------------+----------------------------------+------+
|2051         |9706,14801,13354,...     |3.5,5.0,3.0,...                   |a     |
+-------------+-------------------------+----------------------------------+------+
|2051         |8103,16366,8795,...      |5.0,4.0,0.0,...                   |b     |
+-------------+-------------------------+----------------------------------+------+
|2051         |8795,8070,15341,...      |2.0,0.0,0.0,...                   |c     |
+-------------+-------------------------+----------------------------------+------+
...

如您所见,RecommendedContentAverageGrades列包含RecommendedContentIDs列中每个对应ContentID的平均等级(ID为9706的内容被评分两次,一次为4因此,一旦有3,平均值为3.5)。如果内容未评分,则平均成绩应为0.这里真正重要的是两个以逗号分隔的列表是通讯员,因为RecommendedContentIDs中的列表是排名列表。

我通常会在C#中实现这样的东西,但我想知道是否可以用SQL来完成。我在考虑使用GROUP_CONCAT,但我无法获得正确的结果集。如果有人为MySQL和/或T-SQL提供有效的SQL查询,我将非常感激,但只是建议也会很好。

编辑

#1 - 劳伦斯提到使用单独的表而不是逗号分隔的列表。我使用它们是因为旧设计,我无法改变。但是,我愿意接受这样的答案,即假设逗号分隔列表中的数据存储在单独的表中。

#2 - 改变了Laurence建议的结构(使用分隔表 - 请参阅更新的结构)。

4 个答案:

答案 0 :(得分:3)

这只是跟随@Laurence给出的答案:

http://sqlfiddle.com/#!2/7d236/6

答案 1 :(得分:2)

更新了Akrigg的修复和sql小提琴,以及如何按推荐表中的值排序 还根据brozo的修复使用group_concat子句中的order by进行了更新:

Table RecommendedContent

+-----------------+----------------------+
|RecommendationID | RecommendedContentID |
+-----------------+----------------------+
| 1               | 9706                 |
| 1               | 14801                |
| 1               | 13354                |
| 67              | 8103                 |
| ...             | ...                  |
+-----------------+----------------------+

Select
  a.RecommendationID,
  a.ContentID,
  Group_Concat(a.RecommendedContentId Order By a.Rank),
  Group_Concat(Trim(Trailing '.' From Trim(Trailing '0' From a.AverageGrade)) Order By a.Rank),
  a.Type
From (
  Select
    r.RecommendationID,
    r.ContentID,
    r.Type,
    rc.RecommendedContentID,
    rc.Rank,
    Coalesce(Avg(g.Grade), 0) As AverageGrade
  From
    Recommendations r
      Left Outer Join
    RecommendedContent rc
      On r.RecommendationID = rc.RecommendationID
      Left Outer Join
    Grades g
      On rc.RecommendedContentID = g.RecommendedDocumentID And
         rc.RecommendationID = g.RecommendationID
  Group By
    r.RecommendationID,
    r.ContentID,
    r.Type,
    rc.RecommendedContentID,
    rc.Rank
  ) as a
Group By
  a.RecommendationID,
  a.ContentID,
  a.Type
Order By
  a.ContentID, -- Or other way round if that's what you prefer
  a.RecommendationID

http://sqlfiddle.com/#!2/ca8b8/8

答案 2 :(得分:1)

您可以在SQL Server中create a custom aggreate进行逗号分隔的字符串连接,然后像这样使用它:

SELECT ContentID, RecommendedContentIDs, CustomToCsv(AvgGrade), Type FROM
(
    SELECT ContentID, RecommendedContentIDs, AVG(Grade) AvgGrade, Type 
    FROM Recommendations r INNER JOIN  Grades g ON r.ID = g.RecommendationID
    GROUP BY ContentID, RecommendedContentIDs, RecommendedDocumentID, Type
) as t
GROUP BY ContentID, RecommendedContentIDs, Type

答案 3 :(得分:1)

这是在oracle中完成的

WITH count_number AS
  (SELECT 
    ContentID,
    ','
    ||RecommendedContentIDs
    ||',' new_ContentIDs,
    RecommendedContentIDs,
    type ,
    LENGTH(RECOMMENDEDCONTENTIDS )-LENGTH(REPLACE(RECOMMENDEDCONTENTIDS ,','))+1 COUNT_ID
  FROM Recommendations
  ) ,
  RecommendedContentIDs_postion AS
  (SELECT A1.*,
    B1.CONTENTIDS_OCCURANCE_POSITION ,
    SUBSTR(new_ContentIDs,instr(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION)+1 , INSTR(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION+1)-instr(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION)-1) ContentIDs
  FROM count_number a1,
    (SELECT I ContentIDs_OCCURANCE_POSITION
    FROM DUAL model dimension BY (1 i) measures (0 X) (X[FOR I
    FROM 2 TO 1000 increment 1] = 0)
    ) b1
  WHERE b1.ContentIDs_OCCURANCE_POSITION<=a1.count_id
  )
SELECT 
  CONTENTID,
  WM_CONCAT(CONTENTIDS) RECOMMENDEDCONTENTIDS ,
  WM_CONCAT(GRADE) avg_grade_contentid ,
  type
FROM RECOMMENDEDCONTENTIDS_POSTION RCI,
  (SELECT RECOMMENDEDDOCUMENTID,
    AVG(GRADE) GRADE
  FROM Grades
  GROUP BY RECOMMENDEDDOCUMENTID
  ) GRD
WHERE TRIM(RCI.CONTENTIDS)=TRIM(GRD.RECOMMENDEDDOCUMENTID)
GROUP BY 
  ContentID,
  type;