选择总和忽略组

时间:2012-07-19 20:24:13

标签: mysql select group-by sum

似乎无法将我的查询调整得恰到好处;任何帮助将不胜感激。

这是我的疑问:

SELECT 
  wordlist.Word,
  SUM( worddocfreq.Freq ) AS wordFreq
FROM sourceparsed
  LEFT JOIN worddocfreq ON sourceparsed.ParsedID = worddocfreq.ParsedID
  LEFT JOIN wordlist ON worddocfreq.WordID = wordlist.WordID
WHERE
  sourceparsed.SrcID = 30032
GROUP BY
  wordlist.Word

这可以按预期工作,并且作为示例结果集我得到两列:第一列是不同单词的列表,第二列是每个单词的频率。

但是,我宁愿调整查询,以便第二列是一个比例(即每个单词的出现次数除以单词总数的总和)。单词总数将由第二列的总和给出,因为它是从上面写入的查询中输出的。

所以,我的问题是我不确定如何计算单词总数的总和,因为查询末尾的'group by'追溯性地强制计算每个单词的总和。所以,我不知道如何将我的第二列除以计算的总和,而不考虑“分组”一词。

我觉得需要嵌套选择,但我不确定如何以最佳方式进行整合。

提前感谢任何建议。

干杯,

布赖恩

3 个答案:

答案 0 :(得分:1)

我不确定这是最有效的方法,但请试一试:

SELECT 
  wordlist.Word,
  SUM( worddocfreq.Freq ) / ( SELECT SUM( Freq ) 
                              FROM worddocfreq 
                                JOIN sourceparsed ON 
                                      sourceparsed.SrcID = sp1.SrcID
                                  AND sourceparsed.ParsedID = worddocfreq.ParsedID
                            ) AS proportion
FROM sourceparsed sp1
  LEFT JOIN worddocfreq ON sourceparsed.ParsedID = worddocfreq.ParsedID
  LEFT JOIN wordlist ON worddocfreq.WordID = wordlist.WordID
WHERE
  sourceparsed.SrcID = 30032
GROUP BY
  wordlist.Word

答案 1 :(得分:0)

子查询的CROSS JOIN可能(或可能不)比SetFreeByTruth的方法更有效:

SELECT 
  wordlist.Word,
  SUM( worddocfreq.Freq ) / TotalFreq.TotalFreq AS wordFreq
FROM sourceparsed
  LEFT JOIN worddocfreq ON sourceparsed.ParsedID = worddocfreq.ParsedID
  LEFT JOIN wordlist ON worddocfreq.WordID = wordlist.WordID
  CROSS JOIN ( SELECT SUM( Freq ) AS TotalFreq FROM worddocfreq ) AS TotalFreq
WHERE
  sourceparsed.SrcID = 30032
GROUP BY
  wordlist.Word

答案 2 :(得分:0)

警惕除以零错误。可能有更好的方法,但您可以尝试以下方法:

select c,wordFreq,sum_all, wordFreq/sum_all as proportion from 
(

    (

        select wordlist.Word,
        sum(worddocfreq.Freq) as wordFreq
        from sourceparsed
        left join worddocfreq on sourceparsed.ParsedID = worddocfreq.ParsedID
        left join wordlist on worddocfreq.WordID = wordlist.WordID
        where sourceparsed.SrcID = 30032
        group by wordlist.Word

    ) c
   LEFT OUTER JOIN
   (select SUM(worddocfreq.Freq) sum_all  
    from sourceparsed
    left join worddocfreq on sourceparsed.ParsedID = worddocfreq.ParsedID
    left join wordlist on worddocfreq.WordID = wordlist.WordID 
    where sourceparsed.SrcID = 30032
   ) t
   ON 1=1
)