表加入限制和性能提升

时间:2017-07-17 12:52:22

标签: mysql sql performance limit

背景: 我正在写一个内部搜索引擎优化爬虫来检查我们在谷歌的位置。 抓取工作非常出色,存储很好,但我现在遇到显示数据的性能问题(目前存储表有超过1100万条记录,大小不超过6.0GB。)

我正在尝试创建一个SQL查询,它将显示input_keywords表中的所有记录,然后显示rank_result表(对于给定的CompanyName)的最后一个结果以及来自rank_result表的先前结果(将向我们展示我们的运动,向上或向下)

表格如下

表:input_keywords

-------------------------------------------------------------------------------------------------------
| Field           | Type             | Null | Key | Default             | Extra                       |
-------------------------------------------------------------------------------------------------------
| id              | int(11) unsigned | NO   | PRI | NULL                | auto_increment              |
-------------------------------------------------------------------------------------------------------
| keyword         | char(150)        | YES  | UNI | NULL                |                             |
-------------------------------------------------------------------------------------------------------
| last_check      | timestamp        | YES  | MUL | 2000-01-01 00:00:00 |                             |
-------------------------------------------------------------------------------------------------------
| CREATION        | timestamp        | YES  |     | CURRENT_TIMESTAMP   |                             |
-------------------------------------------------------------------------------------------------------
| MODIFICATION    | timestamp        | YES  |     | NULL                | on update CURRENT_TIMESTAMP |
-------------------------------------------------------------------------------------------------------
| p_deep          | int(1)           | YES  |     | 5                   |                             |
-------------------------------------------------------------------------------------------------------
| check_freq_days | int(11)          | YES  |     | 3                   |                             |
-------------------------------------------------------------------------------------------------------
| type            | char(50)         | YES  |     | NULL                |                             |
-------------------------------------------------------------------------------------------------------
| competitor      | char(100)        | YES  | MUL | CompanyName            |                          |
-------------------------------------------------------------------------------------------------------

表:rank_result:

-----------------------------------------------------------------------------
| Field          | Type             | Null | Key | Default | Extra          |
-----------------------------------------------------------------------------
| id             | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
-----------------------------------------------------------------------------
| keyword        | char(150)        | YES  | MUL |         |                |
-----------------------------------------------------------------------------
| result_url     | text             | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| position       | int(11)          | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| check_time     | timestamp        | YES  | MUL | NULL    |                |
-----------------------------------------------------------------------------
| useragent_used | char(255)        | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| proxy_log      | text             | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| check_date     | date             | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| competitor     | tinytext         | YES  |     | NULL    |                |
-----------------------------------------------------------------------------

一些示例数据显示我正在尝试实现的目标

示例内容:input_keywords

-----------------------------------------------------------------------------------------------------------------------------------------------
| id | keyword               | last_check          | CREATION            | MODIFICATION        | p_deep | check_freq_days | type | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 2  | guitar accessories    | 2017-04-06 10:34:36 | 2017-01-20 12:27:27 | 2017-04-06 08:21:02 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 3  | guitar amps           | 2017-04-06 10:46:42 | 2017-01-20 12:27:33 | 2017-04-06 08:33:08 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 4  | guitar strings        | 2017-04-06 10:50:30 | 2017-01-20 12:27:42 | 2017-04-06 08:36:56 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 5  | guitar effects pedals | 2017-04-06 11:01:44 | 2017-01-20 12:27:50 | 2017-04-06 08:48:11 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------

示例内容:rank_result(编辑后仅显示相关数据)

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id    | keyword            | result_url                           | position | check_time          | useragent_used                       | proxy_log             | check_date | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 723   | guitar accessories | https://www.companyname.com/gui… | 33       | 2017-01-19 17:23:20 | Mozilla/5.0 (X11; OpenBSD i386) App… | NULL                  | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1572  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 19:03:45 | Mozilla/5.0 (Windows NT 6.1; rv:21.… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1672  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 19:08:22 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2511  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 19:51:25 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2656  | guitar accessories | https://www.companyname.com/gui… | 33       | 2017-01-19 19:58:08 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 5.152.200.181         | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2809  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 20:02:51 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3147  | guitar accessories | https://www.companyname.com/gui… | 36       | 2017-01-20 09:19:40 | Mozilla/5.0 (Windows NT 5.1; rv:21.… | 5.152.200.181         | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3490  | guitar accessories | https://www.companyname.com/gui… | 31       | 2017-01-20 11:26:39 | Mozilla/5.0 (compatible; MSIE 10.0;… | 185.17.148.252        | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 4530  | guitar accessories | https://www.companyname.com/gui… | 31       | 2017-01-20 11:37:53 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 185.17.148.252        | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5277  | guitar accessories | https://www.companyname.com/gui… | 34       | 2017-01-20 16:57:30 | Mozilla/5.0 (Windows NT 5.1) AppleW… | 5.152.200.181:27281   | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5480  | guitar accessories | https://www.companyname.com/gui… | 38       | 2017-01-23 12:33:32 | Mozilla/5.0 (X11; OpenBSD i386) App… | 5.152.200.181:27281   | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 9953  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-23 16:02:19 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 149.255.105.142:27281 | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 12836 | guitar accessories | https://www.companyname.com/gui… | 40       | 2017-01-23 18:03:58 | Mozilla/5.0 (X11; Linux x86_64; rv:… | 88.150.147.201:27281  | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 14470 | guitar accessories | https://www.companyname.com/gui… | 38       | 2017-01-23 23:03:55 | Mozilla/5.0 (Windows NT 6.1; WOW64;… | 185.10.202.64:27281   | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 39524 | guitar accessories | https://www.companyname.com/gui… | 32       | 2017-01-24 13:03:09 | Mozilla/5.0 (Windows; U; Windows NT… | 185.10.201.77:27281   | 2017-01-24 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

示例输出:

---------------------------------------------------------------------------------------------------------------------------------------------
| search_keyword              | p_deep | check_freq_days | CREATION            | last_check          | current_position | previous_position |
---------------------------------------------------------------------------------------------------------------------------------------------
| guitar accessories          | 5      | 3               | 2017-01-20 12:27:27 | 2017-07-17 09:03:43 | 37               | 39                |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitar strings     | 5      | 3               | 2017-06-23 17:44:52 | 2017-07-15 01:03:56 | NULL             | NULL              |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitars            | 5      | 1               | 2017-01-20 12:27:17 | 2017-07-16 23:03:44 | 14               | 14                |
---------------------------------------------------------------------------------------------------------------------------------------------
| bass guitars                | 5      | 1               | 2017-01-20 12:31:56 | 2017-07-16 22:03:51 | 41               | 44                |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar Amp1 Nanotube     | 5      | 1               | 2017-01-30 17:48:34 | 2017-07-17 09:30:29 | NULL             | NULL              |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar NanoCab           | 5      | 1               | 2017-01-30 17:48:34 | 2017-07-17 09:30:26 | NULL             | NULL              |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a bass guitar      | 5      | 3               | 2017-05-24 22:21:40 | 2017-07-15 16:04:01 | 5                | 4                 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a guitar           | 5      | 3               | 2017-04-10 15:25:37 | 2017-07-17 00:19:02 | 24               | 24                |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an acoustic guitar | 5      | 3               | 2017-04-10 15:25:37 | 2017-07-17 00:18:33 | 12               | 12                |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an electric guitar | 5      | 3               | 2017-04-10 15:25:37 | 2017-07-17 00:18:51 | 10               | 11                |
---------------------------------------------------------------------------------------------------------------------------------------------

目前我的查询如下:

SELECT i.`keyword` AS 'search_keyword',  i.`p_deep`, i.`check_freq_days`, i.`CREATION`, i.`last_check`,
              (SELECT r.position AS 'current_position' FROM rank_result r where r.`keyword` = search_keyword AND r.`competitor` = 'CompanyName' AND i.`last_check` = r.`check_time` ORDER BY r.check_time DESC LIMIT 0,1) AS 'current_position',
              (SELECT rr.`position` AS 'previous_position' FROM rank_result rr WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' ORDER BY rr.check_time DESC LIMIT 1,1) AS 'previous_position'
              FROM input_keywords i
              WHERE i.keyword LIKE "%s"
              order by i.keyword ASC
              LIMIT 0,100

所以我的问题如下:

  1. 是否有更好的方法来撰写此查询
  2. 我必须将此限制为100个结果,否则查询时间过长,超时,可以解决此问题。
  3. 如果我没有ORDER BY rr.check_time DESC,查询速度要快几百倍,但显然不会返回正确的信息,因为它没有获取最后的记录而是第一个,所以我可以在另一种方式?
  4. 我非常希望没有WHERE KEYWORD LIKE并且只返回我input_keywords以及他们目前的排名和以前的排名。
  5. 其他信息:

    返回关键字的当前排名:

    ***input_keywords          rank_result***
        keyword           ==    keyword
        last_check        ==    check_time (this make sure that if we drop off the search results I don't keep returning an incorrect figure)
        competitor        ==    competitor (this allows us to monitor us and our competitors.)
    

    返回关键字的先前排名

    ***input_keywords          rank_result***
        keyword           ==    keyword
        competitor        ==    competitor (this allows us to monitor us and our competitors.)
        ORDER BY check_time desc
        LIMIT 1,1 (to get the last but one result)
    

    请善待 - 我自学了所有这些东西!

    编辑1。

    解释我当前查询的扩展(我也包括了创建语句)

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | id | select_type        | table | type | possible_keys                | key     | key_len | ref                          | rows | filtered | Extra                       |
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | 1  | PRIMARY            | i     | ALL  | NULL                         | NULL    | NULL    | NULL                         | 1682 | 100.00   | Using where; Using filesort |
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | 3  | DEPENDENT SUBQUERY | rr    | ref  | keyword                      | keyword | 451     | func                         | 32   | 100.00   | Using where; Using filesort |
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | 2  | DEPENDENT SUBQUERY | r     | ref  | keyword,idx_rank_result_che… | keyword | 609     | func,GoogleCrawler.i.last_c… | 2    | 100.00   | Using where; Using filesort |
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    CREATE TABLE `input_keywords` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `keyword` char(150) DEFAULT NULL COMMENT 'the keyword....',
      `last_check` timestamp NULL DEFAULT '2000-01-01 00:00:00' COMMENT 'Last check timestamp, default to years ago so we check immediatly',
      `CREATION` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
      `MODIFICATION` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
      `p_deep` int(1) DEFAULT '5' COMMENT 'how many pages deep to search - default 5',
      `check_freq_days` int(11) DEFAULT '3' COMMENT 'how often to check this keyword in DAYS default 3',
      `type` char(50) DEFAULT NULL COMMENT 'Product, Category, other etc',
      `competitor` tinytext,
      PRIMARY KEY (`id`),
      UNIQUE KEY `UNQ_Keyword` (`keyword`),
      KEY `keyword` (`keyword`(100),`last_check`,`competitor`(100))
    ) ENGINE=InnoDB AUTO_INCREMENT=6001 DEFAULT CHARSET=utf8;
    
    
    CREATE TABLE `rank_result` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `keyword` char(150) DEFAULT '',
      `result_url` text,
      `position` int(11) DEFAULT NULL,
      `check_time` timestamp NULL DEFAULT NULL,
      `useragent_used` char(255) DEFAULT NULL,
      `proxy_log` text,
      `check_date` date DEFAULT NULL COMMENT 'date of the check - easier for graph plotting',
      `competitor` tinytext,
      PRIMARY KEY (`id`),
      KEY `keyword` (`keyword`,`check_time`,`competitor`(50)),
      KEY `idx_rank_result_check_time` (`check_time`)
    ) ENGINE=InnoDB AUTO_INCREMENT=11444318 DEFAULT CHARSET=utf8;
    

    编辑2:

    从目前为止的两个答案中,我调整了rank_result上的索引,并按时间刻度添加了限制。 我现在以< 1s取回我的结果,这是一个了不起的结果。

    但是。

    我仍然觉得我的查询看起来真的很'hacky',觉得必须有一个更好,更清洁的解决方案 - 有吗?

    (当前正在制作中的查询)

    SELECT i.`keyword` AS search_keyword,  i.p_deep, i.check_freq_days, 
    i.CREATION, i.last_check,
            (SELECT r.position
             FROM rank_result r 
             WHERE r.`keyword` = search_keyword AND
                   r.`competitor` = 'Absolute' AND
                   i.`last_check` = r.`check_time`
             ORDER BY r.check_time DESC
             LIMIT 0,1
            ) AS 'current_position',
            (SELECT rr.`position`
             FROM rank_result rr
             WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'Absolute' AND check_time > (NOW() - INTERVAL 2 WEEK)
             ORDER BY rr.check_time DESC
             LIMIT 1, 1
            ) AS 'previous_position'
            FROM input_keywords i
            ORDER BY i.keyword ASC
    

2 个答案:

答案 0 :(得分:2)

对于此查询:

SELECT i.`keyword` AS search_keyword,  i.p_deep, i.check_freq_days, i.CREATION, i.last_check,
        (SELECT r.position
         FROM rank_result r 
         WHERE r.`keyword` = search_keyword AND
               r.`competitor` = 'CompanyName' AND
               i.`last_check` = r.`check_time`
         ORDER BY r.check_time DESC
         LIMIT 0,1
        ) AS current_position,
        (SELECT rr.`position`
         FROM rank_result rr
         WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName'
         ORDER BY rr.check_time DESC
         LIMIT 1, 1
        ) AS previous_position
FROM input_keywords i
WHERE i.keyword LIKE "%s"
ORDER BY i.keyword ASC
LIMIT 0, 100;

您需要rank_result(keyword, competitor, check_time, position)上的索引。

答案 1 :(得分:1)

我会添加到子查询

SELECT rr.`position` AS 'previous_position' 
FROM rank_result rr 
WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' 
ORDER BY rr.check_time DESC LIMIT 1,1

如果可能的限制,例如

AND rr.check_time>NOW - 1 WEEK 

或类似的内容,以限制要处理的记录数量

还可以考虑将查询移动到FROM部分,并使用主查询

加入它