使用大数据集Mysql在两组日期中查找最接近的较低日期

时间:2015-08-18 09:22:12

标签: mysql sql performance datetime bigdata

我有两张桌子

  • “访问”基本上将每次访问存储在网站上
    | visitdate           | city     |
    ----------------------------------
    | 2014-12-01 00:00:02 | Paris    |
    | 2015-01-03 00:00:02 | Marseille|
  • “cityweather”,为很多城市每天存储3次天气信息
    | weatherdate           | city     | temp |
    -------------------------------------------
    | 2014-12-01 09:00:02   | Paris    | 20   |
    | 2014-12-01 09:00:02   | Marseille| 22   |

我确切地说,表格访问中的城市可能不在 cityweather ,反之亦然,我只需要采用两个表共有的城市。

所以我的问题是:

我如何SELECT {@ 1}} visitdate不如访问日期?

它应该是这样的:

    | visitdate           | city     | beforedate          |
    --------------------------------------------------------
    | 2014-12-01 00:00:02 | Paris    | 2014-11-30 21:00:00 |
    | 2015-01-03 15:07:26 | Marseille| 2015-01-03 09:00:00 |

我试过这样的事情:

MAX(weatherdate)

但是表的大小使得无法在“合理”的时间内计算它(10 ^ 14步):

    | id | select_type        | table       | type  | possible_keys         | key          | key_len | ref          | rows    | Extra                     |
    ---------------------------------------------------------------------------------------------------------------------------------------------------------
    | 1  | PRIMARY            | d           | ALL   | idx_city,Idx_citydate | NULL         | NULL    | NULL         | 1204305 | Using where               |
    | 1  | PRIMARY            | t           | ref   | Idxcity, Idxcitydate  | Idxcitydate  | 303     | meteo.d.city | 111     | Using where; Using index  |
    | 2  | DEPENDANT SUBQUERY | cityweather | index | NULL                  | Idx_date     | 6       | NULL         | 1204305 | Using where; Using index  |

我现在正在调查SELECT t.city, t.visitdate, d.weatherdate as beforedate FROM visitsub as t JOIN cityweatherfrsub as d ON d.weatherdate = ( SELECT MAX(d.weatherdate) FROM cityweatherfrsub WHERE d.weatherdate <= t.visitdate AND d.city=t.city ) AND d.city = t.city; user-variable的字段,但我对它很陌生,只写了一些不起作用的内容@variable

Error Code: 1111. Invalid use of group function

You can find here a similar post but it can't work for my problem

5 个答案:

答案 0 :(得分:0)

也许是这样的:

select
    V.*,
    (
        select
            MAX(weatherdate) 
        from Weather W
        where
            W.weatherdate < V.visitdate and
            W.city = V.city
    ) beforedate
from Visit V
where
    exists ( select 1 from Weather where V.city = W.city)

答案 1 :(得分:0)

试试这个:

 SELECT t.visitdate, t.city, max(d.weatherdate) beforedate
  FROM visit t inner JOIN cityweather d
  on t.city=d.city
  group by t.city,t.visitdate

答案 2 :(得分:0)

我不确定这是否是你需要的,但它应该可以解决问题。

SELECT t.visitdate, d.city, MAX(d.weatherdate) as beforedate
   FROM cityweather d
   JOIN visit t
   ON d.weatherdate <= t.visitdate
   AND d.city=t.city
   GROUP BY t.visitdate, d.city;

答案 3 :(得分:0)

替代方法,避免使用MAX()

SELECT v.visitdate, v.city, w.weatherdate AS beforedate
FROM visit v
JOIN cityweather w
        ON v.city = w.city
        AND v.visitdate >= w.weatherdate
        AND NOT EXISTS ( SELECT * FROM cityweather nx
                WHERE nx.city = v.city
                AND nx.weatherdate <= v.visitdate
                AND nx.weatherdate > w.weatherdate
        );

答案 4 :(得分:0)

我最终找到了自己的答案。这一切都归结为缩小表cityweather上的选择。所以我分两步完成它以避免我们迄今为止遇到的O(n ^ 2)问题,并减少了在其他答案中找到的第一个表(有时是虚拟表)的大小:

第一步(关键一步):

CREATE TABLE intermedtable 
   SELECT t.city, t.visitdate, d.weatherdate
      FROM visit as t 
      JOIN cityweather as d
      WHERE d.city=t.city AND d.weatherdate <= t.visitdate AND d.weatherdate +  interval 1 day >= t.visitdate;

与我们之前所拥有的相比,d.weatherdate + interval 1 day >= t.visitdate条件至关重要。它只是&#34;花了22分钟。

第二步是为每对MAX(weatherdate)找到(city, visitdate)

Create table beforedatetable
   SELECT city, visitdate, max(weatherdate) as beforedate 
       FROM intermedtable
       GROUP BY city, visitdate;

通过这个解决方案,我从16小时计算(最后崩溃)下降到32分钟。

这个答案的核心是通过添加d.weatherdate + interval 1 day >= t.visitdate条件来减小先前答案中创建的虚拟表的大小。这是基于这样一个事实,即感兴趣的风云日期距离访问日期不超过一天。