每天计算推文,在同一张表中按3个其他列加权

时间:2012-08-01 08:55:31

标签: mysql

(这是一篇相当长的帖子,但问题是我觉得很容易解决,我准备好了SQLFiddle)请考虑下表:

----------------------------------------------------------------------
tweet_id sp100_id nyse_date   user_id class_id retweets quality follow
----------------------------------------------------------------------
1        1        2011-03-12  1       1        0        2.50    5.00
2        1        2011-03-13  1       2        2        2.50    5.00
3        1        2011-03-13  1       2        1        2.50    5.00
4        1        2011-03-13  2       2        0        0.75    1.00
5        1        2011-03-13  2       3        3        0.75    1.00
6        2        2011-03-12  2       2        3        0.75    1.00
7        2        2011-03-12  2       2        0        0.75    1.00
8        2        2011-03-12  1       3        5        2.50    5.00
9        2        2011-03-13  2       2        0        0.75    1.00
----------------------------------------------------------------------

此表格中的所需输出是每sp100_id_date的列表,每个retweets加权的正数(class = 2)和负数(class = 3)加权数量quality {1}}和follow

--------------------------------------------------------------------------------
sp100_id  nyse_date  pos-rt pos-quality pos-follow neg-rt neg-quality neg-follow
--------------------------------------------------------------------------------
1         2011-03-11 0      0           0          0      0           0
1         2011-03-12 0      0           0          0      0           0
1         2011-03-13 3 (1)  5.75 (2)    11.00 (3)  3 (4)  0.75        1.00
2         2011-03-11 0      0           0          0      0           0
2         2011-03-12 3      1.50        10.00      5.00   2.50        2.50
2         2011-03-13 0      0.75        1.00       0      0           0
--------------------------------------------------------------------------------

On 2011-03-13, 3 positive tweets for sp100_id 1:

(1) 1 tweet retweeted 2 times, 1 tweets retweeted 1 time and 
    1 tweet retweeted 0 times = 1 x 2 + 1 x 1 + 1 x 0 = 3
(2) 2 tweets with quality 2.50 and 1 tweet with quality 0.75 =
    2 x 2.50 + 1 x 0.75 = 5.75
(3) 2 tweets with follow 5 and 1 tweet with follow 1 =
    2 x 5.00 + 1 x 1.00 = 11.00

On 2011-03-13, 1 negative tweets for sp100_id 1:

(4) 1 tweet retweeted 3 times = 1 x 3 = 3

etc...

我在SQLFiddle上有一个带有必要的其他表的演示(我需要将它链接到一个日期范围表,因为我还想要包含所有零的记录集)。我的查询也有输出,但我不明白为什么它与所需的输出不同:

--------------------------------------------------------------------------------
sp100_id  nyse_date  pos-rt pos-quality pos-follow neg-rt neg-quality neg-follow
--------------------------------------------------------------------------------
1         2011-03-11 0      0           0          0      0           0
1         2011-03-12 3      2           2          5      3           5
1         2011-03-13 3      8           12         3      1           1
2         2011-03-11 0      0           0          0      0           0
2         2011-03-12 3      2           2          5      3           5
2         2011-03-13 3      8           12         3      1           1
--------------------------------------------------------------------------------

我看不出问题出在哪里。你呢?非常感谢您的帮助: - )

2 个答案:

答案 0 :(得分:2)

它未返回预期值的原因是您需要在sp100.sp100_id = tweets.sp100_id条件中包含LEFT JOIN以及日期。

仅加入日期,它将加入表格中的任何日期值,无论sp100_id如何。这就是为什么你的结果总和被抛弃了,因为对于每个sp100_id,它包含sp100_id中所有其他SUM()的值。

我也稍微清理了你的查询(仅仅是在美学方面):

SELECT     a.sp100_id,
           b._date AS nyse_date,
           SUM(IF(c.class=2, c.retweets, 0)) AS 'pos-rt',
           SUM(IF(c.class=2, c.quality,  0)) AS 'pos-quality',
           SUM(IF(c.class=2, c.follow,   0)) AS 'pos-follow',
           SUM(IF(c.class=3, c.retweets, 0)) AS 'neg-retweet',
           SUM(IF(c.class=3, c.quality,  0)) AS 'neg-quality',
           SUM(IF(c.class=3, c.follow,   0)) AS 'neg-follow'
FROM       sp100 a
CROSS JOIN daterange b
LEFT JOIN  tweets c ON a.sp100_id = c.sp100_id 
                   AND b._date = c .nyse_date
GROUP BY   a.sp100_id, 
           nyse_date

SQLFiddle Demo

答案 1 :(得分:1)

我能看到的唯一问题是你使用dec数据类型。我把它换成浮动,一切都很好。

我错过了一些不正确的值吗?

当您手动进行数学运算时,您缺少3月13日(最后一行)的某些值。