如何过滤掉特定列下具有重复值的数据?

时间:2015-05-19 14:53:12

标签: sql sql-server-2012 survey

我有一张表格,其中包含调查结果:

submitter   issue       q1  q2  q3  q4  q5

mike         11557      4   3   4   5   1
mark         13554      5   5   5   5   5
luke         15110      1   1   1   1   1
luke         15110      1   1   1   1   1
donald       16900      4   2   2   4   5
joe          11562      5   5   5   5   5
joe          11562      5   5   5   5   5
sam          12485      2   3   4   3   4
sam          12485      2   3   4   3   4
sam          12485      2   3   4   3   4

我希望能够过滤掉多个提交内容并只计算其中一个。 有些人提交了3到4次。

我知道如何查明调查提交的次数和由谁:

SELECT
    submitter
    ,issue
    ,COUNT(*) as '# of times Survey submitted'

FROM
    Survey

GROUP BY
    submitter, issue

HAVING
    COUNT(*) > 1

但是,我不确定如何使用此查询过滤掉多个提交。

我正在处理的当前查询是:

SELECT 'Question #1' as 'Survey Question'
,CAST(CAST(SUM(q1) AS float)/COUNT(q1) AS decimal (4,2)) as 'Average Score'

FROM Survey
WHERE COALESCE(q1,q2,q3,q4,q5) IS NOT NULL

UNION ALL

SELECT 'Question #2' as 'Survey Question'
,CAST(CAST(SUM(q2) AS float)/COUNT(q2) AS decimal (4,2)) as 'Average Score'
FROM Survey
WHERE COALESCE(q1,q2,q3,q4,q5) IS NOT NULL

UNION ALL

etc...

期望的结果是:(注意:此结果集不准确。只是我想要的格式。)

Survey Question Average Score
Question #1      4.58
Question #2      4.80
Question #3      4.60
Question #4      4.59
Question #5      4.64

任何人都可以提供线索吗?

非常感谢!

3 个答案:

答案 0 :(得分:2)

我认为我的数学是正确的,但我的结果并不完全符合你的要求。你确定你想要的结果是正确的吗?

DECLARE @yourTable TABLE (submitter VARCHAR(10), Issue INT, q1 TINYINT, q2 TINYINT,q3 TINYINT, q4 TINYINT,q5 TINYINT);
INSERT INTO @yourTable
VALUES  ('mike',11557,4,3,4,5,1),
        ('mark',13554,5,5,5,5,5),
        ('luke',15110,1,1,1,1,1),
        ('luke',15110,1,1,1,1,1),
        ('donald',16900,4,2,2,4,5),
        ('joe',11562,5,5,5,5,5),
        ('joe',11562,5,5,5,5,5),
        ('sam',12485,2,3,4,3,4),
        ('sam',12485,2,3,4,3,4),
        ('sam',12485,2,3,4,3,4);

WITH CTE_Distinct
AS
(
    SELECT DISTINCT *
    FROM @yourTable  --just change this to your actual table name.
)

SELECT  REPLACE(question,'q','Question #')   AS [Survey Question],
        CAST(AVG(val * 1.0) AS DECIMAL(4,2)) AS [Average Score]
FROM CTE_Distinct
UNPIVOT
(
    val FOR question IN (q1,q2,q3,q4,q5)
) unpvt
GROUP BY question

结果:

Survey Question     Average Score
-------------------- ---------------------------------------
Question #1          3.50
Question #2          3.17
Question #3          3.50
Question #4          3.83
Question #5          3.50

答案 1 :(得分:1)

WITH TestData AS (
    SELECT *
    FROM (VALUES
        ('Mike', 11557, 4, 3, 4, 5, 1)
      , ('Mark', 13554, 5, 3, 5, 5, 5)
      , ('Luke', 15110, 1, 1, 1, 1, 1)
      , ('Luke', 15110, 1, 1, 1, 1, 1)
      , ('Donald', 16900, 4, 2, 2, 4, 5)
      , ('Joe', 11562, 5, 5, 5, 5, 5)
      , ('Joe', 11562, 5, 5, 5, 5, 5)
      , ('Sam', 12485, 2, 3, 4, 3, 4)    
      , ('Sam', 12485, 2, 3, 4, 3, 4)    
      , ('Sam', 12485, 2, 3, 4, 3, 4)    
    ) A (Submitter, Issue, Q1, Q2, Q3, Q4, Q5)
)
SELECT SurveyQuestion
     , AverageScore = AVG(QuestionAnswer * 1.) -- Change the math here if this isn't what you want 
FROM (    
    SELECT A.Submitter
         , A.Issue
         , B.SurveyQuestion
         , B.QuestionAnswer
         , RowNum = ROW_NUMBER() OVER(PARTITION BY A.Submitter, A.Issue, B.SurveyQuestion ORDER BY (SELECT NULL)) -- Replace ORDER BY (SELECT NULL) with something more meaningful if you can
    FROM TestData A
    CROSS APPLY(VALUES -- Unpivot
        ('Question #1', A.Q1)
      , ('Question #2', A.Q2)
      , ('Question #3', A.Q3)
      , ('Question #4', A.Q4)
      , ('Question #5', A.Q5)
    ) B (SurveyQuestion, QuestionAnswer)
    WHERE B.SurveyQuestion IS NOT NULL
) A
WHERE RowNum = 1
GROUP BY SurveyQuestion;

答案 2 :(得分:0)

我认为您可以应用的第一个解决方案是:选择提交者并发布每个求和者给出的每个答案的最大值

    select *
      from survey
      where (submitter, issue, id ) in 
      (
    select submitter, issue, max(id)
      from survey 
     group by submitter, issue);

但是这个解决方案的问题在于它给出了每个问题最大的答案,这可能不是所需的输出。

另一种方法是在每个寄存器中添加一个id:

    select avg(q1) as avg_q1, 
           avg(q2) as avg_q2, 
           ....
      from survey
      where (submitter, issue, id ) in 
      (
    select submitter, issue, max(id)
      from survey 
     group by submitter, issue);

使用id标记每一行不同,这是另一个keetle的鱼。选择更加简单:

{{1}}

内部选择(具有分组的那个)标识您想要获得的ID,第二个选择检索所有信息:提交者,ID和答案。您可以使用max()将最后一个答案作为 good 答案检索,或者您可以将其与min()一起使用以检索第一个答案。

<强>更新

对不起,我没看过&#34;平均&#34;请求你。如果你想要一个平均值而不是答案,我谦卑地推荐第二种方法。然后选择:

{{1}}