似乎无法将我的查询调整得恰到好处;任何帮助将不胜感激。
这是我的疑问:
SELECT
wordlist.Word,
SUM( worddocfreq.Freq ) AS wordFreq
FROM sourceparsed
LEFT JOIN worddocfreq ON sourceparsed.ParsedID = worddocfreq.ParsedID
LEFT JOIN wordlist ON worddocfreq.WordID = wordlist.WordID
WHERE
sourceparsed.SrcID = 30032
GROUP BY
wordlist.Word
这可以按预期工作,并且作为示例结果集我得到两列:第一列是不同单词的列表,第二列是每个单词的频率。
但是,我宁愿调整查询,以便第二列是一个比例(即每个单词的出现次数除以单词总数的总和)。单词总数将由第二列的总和给出,因为它是从上面写入的查询中输出的。
所以,我的问题是我不确定如何计算单词总数的总和,因为查询末尾的'group by'追溯性地强制计算每个单词的总和。所以,我不知道如何将我的第二列除以计算的总和,而不考虑“分组”一词。
我觉得需要嵌套选择,但我不确定如何以最佳方式进行整合。
提前感谢任何建议。
干杯,
布赖恩
答案 0 :(得分:1)
我不确定这是最有效的方法,但请试一试:
SELECT
wordlist.Word,
SUM( worddocfreq.Freq ) / ( SELECT SUM( Freq )
FROM worddocfreq
JOIN sourceparsed ON
sourceparsed.SrcID = sp1.SrcID
AND sourceparsed.ParsedID = worddocfreq.ParsedID
) AS proportion
FROM sourceparsed sp1
LEFT JOIN worddocfreq ON sourceparsed.ParsedID = worddocfreq.ParsedID
LEFT JOIN wordlist ON worddocfreq.WordID = wordlist.WordID
WHERE
sourceparsed.SrcID = 30032
GROUP BY
wordlist.Word
答案 1 :(得分:0)
子查询的CROSS JOIN
可能(或可能不)比SetFreeByTruth的方法更有效:
SELECT
wordlist.Word,
SUM( worddocfreq.Freq ) / TotalFreq.TotalFreq AS wordFreq
FROM sourceparsed
LEFT JOIN worddocfreq ON sourceparsed.ParsedID = worddocfreq.ParsedID
LEFT JOIN wordlist ON worddocfreq.WordID = wordlist.WordID
CROSS JOIN ( SELECT SUM( Freq ) AS TotalFreq FROM worddocfreq ) AS TotalFreq
WHERE
sourceparsed.SrcID = 30032
GROUP BY
wordlist.Word
答案 2 :(得分:0)
警惕除以零错误。可能有更好的方法,但您可以尝试以下方法:
select c,wordFreq,sum_all, wordFreq/sum_all as proportion from
(
(
select wordlist.Word,
sum(worddocfreq.Freq) as wordFreq
from sourceparsed
left join worddocfreq on sourceparsed.ParsedID = worddocfreq.ParsedID
left join wordlist on worddocfreq.WordID = wordlist.WordID
where sourceparsed.SrcID = 30032
group by wordlist.Word
) c
LEFT OUTER JOIN
(select SUM(worddocfreq.Freq) sum_all
from sourceparsed
left join worddocfreq on sourceparsed.ParsedID = worddocfreq.ParsedID
left join wordlist on worddocfreq.WordID = wordlist.WordID
where sourceparsed.SrcID = 30032
) t
ON 1=1
)