如何在mysql上处理大数据查询具有更好的性能

时间:2016-11-26 05:11:34

标签: mysql sql database mysql-workbench

目前,我正试图通过MySQL工作台从两个不同的数据库模式中的三个不同的表运行一个特定的查询,但我无法实现它。

我目前在数据库架构中有一个trackcompleted表,在另一台服务器上有一个location表数据库。

  1. track表收集待办事项的开始和结束。

  2. completed表保存结果

  3. location数据库用于获取待办事项的创建和完成位置。

  4. 轨道

    +----+----------+---------------+---------------------+
    | tid | user_id | function_name | track_time          |
    +-----+---------+---------------+---------------------+
    |  1  | des     | create        | 2015-02-29 1 pm |
    |  2  | des     | complete      | 2015-02-29 2 pm |
    |  3  | greg    | create        | 2015-02-29 3 pm |
    |  4  | greg    | complete      | 2015-02-29 4 pm |
    +-----+---------+---------------+---------------------+
    

    完成

    +-----+------+---------------------+
    | tid | uid  |  insert_time        |
    +-----+------+---------------------+
    |  1  | des  | 2015-02-29 1 pm |
    |  2  | des  | 2015-02-29 2 pm |
    |  3  | greg | 2015-02-29 3 pm |
    |  4  | greg | 2015-02-29 4 pm |
    +-----+------+---------------------+
    

    位置

    +----+----------+---------------+----------+
    | tid | user_id | action        | location |
    +-----+---------+---------------+----------+
    |  1  | des     | create        | subways  |
    |  2  | des     | complete      | home     |
    |  3  | greg    | create        | home     |
    |  4  | greg    | complete      | market   |
    +-----+---------+---------------+----------+
    

    我能够从下面相同的数据库模式中的两个表中获取连接结果:

    查询结果

    +-----+---------+---------------+-----------------+-----+------+---------------+
    | tid | user_id | function_name | track_time      | tid | uid  | insert_time   |
    +-----+---------+---------------+-----------------+-----+------+---------------+
    |  2  | des     | complete      | 2015-02-29 1 pm | 2   | des  | 15-02-29 2 pm |
    |  4  | greg    | complete      | 2015-02-29 3 pm | 4   | greg | 15-02-29 4 pm |
    +-----+---------+---------------+-----------------+-----+------+---------------+
    
    
    select * from 
    svr1.tracking t, 
    svr1.completed c 
    where 
    t.user_id = c.uid 
    and t.tid = c.tid 
    and t.function_name = 'create' 
    and t.track_time > '2015-02-29 00:00:00' 
    and t.track_time < '2015-02-29 23:59:59'
    

    但是,我在查询中也需要location信息但是一天的位置表有1.5亿条记录,并且由于我的mac的16gb ram用完了,即使它们已编入索引,也需要永远运行。

    我要求输出

    user_id,
    create tid, 
    function_name, 
    track_time, 
    create location, 
    complete tid, 
    function_name,
    track_time,
    location
    

    会给我一个如下输出:

    des, 1, create, 2015-02-29 1 pm, subways, 2,complete, 2015-02-29 2 pm, home
    

    这是组合跟踪信息和结果 enter image description here

    用户信息

    enter image description here

    在位置ID是用户设备的哈希mac地址,我想找出每个特定用户的跟踪时间和位置记录时间。 enter image description here

    我想知道最好的脚本是为了实现它而编写的,因为使用workbench运行对我来说不起作用。

    感谢您阅读,非常感谢任何评论!!

1 个答案:

答案 0 :(得分:0)

我最终通过另一个问题帖子解决了这些问题: How to get latest results by date when selecting from two table?

通过将结果导出到另一个服务器数据库,我可以通过以下代码获得所需内容。

SELECT
r1.uid,r1.tid,r1.insert_time,l1.location,l1.time,timeDiff
FROM
(
select 
r.uid,
r.tid,
l.time,
l.location,
r.androidId,
r.insert_time,
min(abs(TIME_TO_SEC(TIMEDIFF(insert_time,l.time)))) as timeDiff
from
locationDB.track_result_submitted_mac r inner join
locationDB.location_archival_2015_09 l

on androidId = id
where

l.time > '2015-09-23 00:00:00' and l.time < '2015-09-30 23:59:59'  and
r.insert_time > '2015-09-23 00:00:00' and r.insert_time < '2015-09-30 23:59:59'

and abs(TIME_TO_SEC(TIMEDIFF(insert_time,l.time))) < 18000 
group by uid,tid

) as t,

indoorloc.track_result_submitted_mac r1 inner join
indoorloc.location_archival_2015_09 l1
on r1.androidId = l1.id

WHERE 
(abs(TIME_TO_SEC(TIMEDIFF(r1.insert_time,l1.time)))) = t.tim and
r1.uid = t.uid and
r1.tid = t.tid

group by uid, tid