背景: 我正在写一个内部搜索引擎优化爬虫来检查我们在谷歌的位置。 抓取工作非常出色,存储很好,但我现在遇到显示数据的性能问题(目前存储表有超过1100万条记录,大小不超过6.0GB。)
我正在尝试创建一个SQL查询,它将显示input_keywords
表中的所有记录,然后显示rank_result
表(对于给定的CompanyName)的最后一个结果以及来自rank_result
表的先前结果(将向我们展示我们的运动,向上或向下)
表格如下
表:input_keywords
-------------------------------------------------------------------------------------------------------
| Field | Type | Null | Key | Default | Extra |
-------------------------------------------------------------------------------------------------------
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
-------------------------------------------------------------------------------------------------------
| keyword | char(150) | YES | UNI | NULL | |
-------------------------------------------------------------------------------------------------------
| last_check | timestamp | YES | MUL | 2000-01-01 00:00:00 | |
-------------------------------------------------------------------------------------------------------
| CREATION | timestamp | YES | | CURRENT_TIMESTAMP | |
-------------------------------------------------------------------------------------------------------
| MODIFICATION | timestamp | YES | | NULL | on update CURRENT_TIMESTAMP |
-------------------------------------------------------------------------------------------------------
| p_deep | int(1) | YES | | 5 | |
-------------------------------------------------------------------------------------------------------
| check_freq_days | int(11) | YES | | 3 | |
-------------------------------------------------------------------------------------------------------
| type | char(50) | YES | | NULL | |
-------------------------------------------------------------------------------------------------------
| competitor | char(100) | YES | MUL | CompanyName | |
-------------------------------------------------------------------------------------------------------
表:rank_result:
-----------------------------------------------------------------------------
| Field | Type | Null | Key | Default | Extra |
-----------------------------------------------------------------------------
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
-----------------------------------------------------------------------------
| keyword | char(150) | YES | MUL | | |
-----------------------------------------------------------------------------
| result_url | text | YES | | NULL | |
-----------------------------------------------------------------------------
| position | int(11) | YES | | NULL | |
-----------------------------------------------------------------------------
| check_time | timestamp | YES | MUL | NULL | |
-----------------------------------------------------------------------------
| useragent_used | char(255) | YES | | NULL | |
-----------------------------------------------------------------------------
| proxy_log | text | YES | | NULL | |
-----------------------------------------------------------------------------
| check_date | date | YES | | NULL | |
-----------------------------------------------------------------------------
| competitor | tinytext | YES | | NULL | |
-----------------------------------------------------------------------------
一些示例数据显示我正在尝试实现的目标
示例内容:input_keywords
-----------------------------------------------------------------------------------------------------------------------------------------------
| id | keyword | last_check | CREATION | MODIFICATION | p_deep | check_freq_days | type | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 2 | guitar accessories | 2017-04-06 10:34:36 | 2017-01-20 12:27:27 | 2017-04-06 08:21:02 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 3 | guitar amps | 2017-04-06 10:46:42 | 2017-01-20 12:27:33 | 2017-04-06 08:33:08 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 4 | guitar strings | 2017-04-06 10:50:30 | 2017-01-20 12:27:42 | 2017-04-06 08:36:56 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 5 | guitar effects pedals | 2017-04-06 11:01:44 | 2017-01-20 12:27:50 | 2017-04-06 08:48:11 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
示例内容:rank_result(编辑后仅显示相关数据)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id | keyword | result_url | position | check_time | useragent_used | proxy_log | check_date | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 723 | guitar accessories | https://www.companyname.com/gui… | 33 | 2017-01-19 17:23:20 | Mozilla/5.0 (X11; OpenBSD i386) App… | NULL | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1572 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 19:03:45 | Mozilla/5.0 (Windows NT 6.1; rv:21.… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1672 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 19:08:22 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2511 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 19:51:25 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2656 | guitar accessories | https://www.companyname.com/gui… | 33 | 2017-01-19 19:58:08 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 5.152.200.181 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2809 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 20:02:51 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3147 | guitar accessories | https://www.companyname.com/gui… | 36 | 2017-01-20 09:19:40 | Mozilla/5.0 (Windows NT 5.1; rv:21.… | 5.152.200.181 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3490 | guitar accessories | https://www.companyname.com/gui… | 31 | 2017-01-20 11:26:39 | Mozilla/5.0 (compatible; MSIE 10.0;… | 185.17.148.252 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 4530 | guitar accessories | https://www.companyname.com/gui… | 31 | 2017-01-20 11:37:53 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 185.17.148.252 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5277 | guitar accessories | https://www.companyname.com/gui… | 34 | 2017-01-20 16:57:30 | Mozilla/5.0 (Windows NT 5.1) AppleW… | 5.152.200.181:27281 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5480 | guitar accessories | https://www.companyname.com/gui… | 38 | 2017-01-23 12:33:32 | Mozilla/5.0 (X11; OpenBSD i386) App… | 5.152.200.181:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 9953 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-23 16:02:19 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 149.255.105.142:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 12836 | guitar accessories | https://www.companyname.com/gui… | 40 | 2017-01-23 18:03:58 | Mozilla/5.0 (X11; Linux x86_64; rv:… | 88.150.147.201:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 14470 | guitar accessories | https://www.companyname.com/gui… | 38 | 2017-01-23 23:03:55 | Mozilla/5.0 (Windows NT 6.1; WOW64;… | 185.10.202.64:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 39524 | guitar accessories | https://www.companyname.com/gui… | 32 | 2017-01-24 13:03:09 | Mozilla/5.0 (Windows; U; Windows NT… | 185.10.201.77:27281 | 2017-01-24 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
示例输出:
---------------------------------------------------------------------------------------------------------------------------------------------
| search_keyword | p_deep | check_freq_days | CREATION | last_check | current_position | previous_position |
---------------------------------------------------------------------------------------------------------------------------------------------
| guitar accessories | 5 | 3 | 2017-01-20 12:27:27 | 2017-07-17 09:03:43 | 37 | 39 |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitar strings | 5 | 3 | 2017-06-23 17:44:52 | 2017-07-15 01:03:56 | NULL | NULL |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitars | 5 | 1 | 2017-01-20 12:27:17 | 2017-07-16 23:03:44 | 14 | 14 |
---------------------------------------------------------------------------------------------------------------------------------------------
| bass guitars | 5 | 1 | 2017-01-20 12:31:56 | 2017-07-16 22:03:51 | 41 | 44 |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar Amp1 Nanotube | 5 | 1 | 2017-01-30 17:48:34 | 2017-07-17 09:30:29 | NULL | NULL |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar NanoCab | 5 | 1 | 2017-01-30 17:48:34 | 2017-07-17 09:30:26 | NULL | NULL |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a bass guitar | 5 | 3 | 2017-05-24 22:21:40 | 2017-07-15 16:04:01 | 5 | 4 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a guitar | 5 | 3 | 2017-04-10 15:25:37 | 2017-07-17 00:19:02 | 24 | 24 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an acoustic guitar | 5 | 3 | 2017-04-10 15:25:37 | 2017-07-17 00:18:33 | 12 | 12 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an electric guitar | 5 | 3 | 2017-04-10 15:25:37 | 2017-07-17 00:18:51 | 10 | 11 |
---------------------------------------------------------------------------------------------------------------------------------------------
目前我的查询如下:
SELECT i.`keyword` AS 'search_keyword', i.`p_deep`, i.`check_freq_days`, i.`CREATION`, i.`last_check`,
(SELECT r.position AS 'current_position' FROM rank_result r where r.`keyword` = search_keyword AND r.`competitor` = 'CompanyName' AND i.`last_check` = r.`check_time` ORDER BY r.check_time DESC LIMIT 0,1) AS 'current_position',
(SELECT rr.`position` AS 'previous_position' FROM rank_result rr WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' ORDER BY rr.check_time DESC LIMIT 1,1) AS 'previous_position'
FROM input_keywords i
WHERE i.keyword LIKE "%s"
order by i.keyword ASC
LIMIT 0,100
所以我的问题如下:
WHERE KEYWORD LIKE
并且只返回我input_keywords
以及他们目前的排名和以前的排名。其他信息:
返回关键字的当前排名:
***input_keywords rank_result***
keyword == keyword
last_check == check_time (this make sure that if we drop off the search results I don't keep returning an incorrect figure)
competitor == competitor (this allows us to monitor us and our competitors.)
返回关键字的先前排名
***input_keywords rank_result***
keyword == keyword
competitor == competitor (this allows us to monitor us and our competitors.)
ORDER BY check_time desc
LIMIT 1,1 (to get the last but one result)
请善待 - 我自学了所有这些东西!
编辑1。
解释我当前查询的扩展(我也包括了创建语句)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | PRIMARY | i | ALL | NULL | NULL | NULL | NULL | 1682 | 100.00 | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3 | DEPENDENT SUBQUERY | rr | ref | keyword | keyword | 451 | func | 32 | 100.00 | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2 | DEPENDENT SUBQUERY | r | ref | keyword,idx_rank_result_che… | keyword | 609 | func,GoogleCrawler.i.last_c… | 2 | 100.00 | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE TABLE `input_keywords` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`keyword` char(150) DEFAULT NULL COMMENT 'the keyword....',
`last_check` timestamp NULL DEFAULT '2000-01-01 00:00:00' COMMENT 'Last check timestamp, default to years ago so we check immediatly',
`CREATION` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`MODIFICATION` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
`p_deep` int(1) DEFAULT '5' COMMENT 'how many pages deep to search - default 5',
`check_freq_days` int(11) DEFAULT '3' COMMENT 'how often to check this keyword in DAYS default 3',
`type` char(50) DEFAULT NULL COMMENT 'Product, Category, other etc',
`competitor` tinytext,
PRIMARY KEY (`id`),
UNIQUE KEY `UNQ_Keyword` (`keyword`),
KEY `keyword` (`keyword`(100),`last_check`,`competitor`(100))
) ENGINE=InnoDB AUTO_INCREMENT=6001 DEFAULT CHARSET=utf8;
CREATE TABLE `rank_result` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`keyword` char(150) DEFAULT '',
`result_url` text,
`position` int(11) DEFAULT NULL,
`check_time` timestamp NULL DEFAULT NULL,
`useragent_used` char(255) DEFAULT NULL,
`proxy_log` text,
`check_date` date DEFAULT NULL COMMENT 'date of the check - easier for graph plotting',
`competitor` tinytext,
PRIMARY KEY (`id`),
KEY `keyword` (`keyword`,`check_time`,`competitor`(50)),
KEY `idx_rank_result_check_time` (`check_time`)
) ENGINE=InnoDB AUTO_INCREMENT=11444318 DEFAULT CHARSET=utf8;
编辑2:
从目前为止的两个答案中,我调整了rank_result
上的索引,并按时间刻度添加了限制。
我现在以< 1s取回我的结果,这是一个了不起的结果。
但是。
我仍然觉得我的查询看起来真的很'hacky',觉得必须有一个更好,更清洁的解决方案 - 有吗?
(当前正在制作中的查询)
SELECT i.`keyword` AS search_keyword, i.p_deep, i.check_freq_days,
i.CREATION, i.last_check,
(SELECT r.position
FROM rank_result r
WHERE r.`keyword` = search_keyword AND
r.`competitor` = 'Absolute' AND
i.`last_check` = r.`check_time`
ORDER BY r.check_time DESC
LIMIT 0,1
) AS 'current_position',
(SELECT rr.`position`
FROM rank_result rr
WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'Absolute' AND check_time > (NOW() - INTERVAL 2 WEEK)
ORDER BY rr.check_time DESC
LIMIT 1, 1
) AS 'previous_position'
FROM input_keywords i
ORDER BY i.keyword ASC
答案 0 :(得分:2)
对于此查询:
SELECT i.`keyword` AS search_keyword, i.p_deep, i.check_freq_days, i.CREATION, i.last_check,
(SELECT r.position
FROM rank_result r
WHERE r.`keyword` = search_keyword AND
r.`competitor` = 'CompanyName' AND
i.`last_check` = r.`check_time`
ORDER BY r.check_time DESC
LIMIT 0,1
) AS current_position,
(SELECT rr.`position`
FROM rank_result rr
WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName'
ORDER BY rr.check_time DESC
LIMIT 1, 1
) AS previous_position
FROM input_keywords i
WHERE i.keyword LIKE "%s"
ORDER BY i.keyword ASC
LIMIT 0, 100;
您需要rank_result(keyword, competitor, check_time, position)
上的索引。
答案 1 :(得分:1)
我会添加到子查询
SELECT rr.`position` AS 'previous_position'
FROM rank_result rr
WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName'
ORDER BY rr.check_time DESC LIMIT 1,1
如果可能的限制,例如
AND rr.check_time>NOW - 1 WEEK
或类似的内容,以限制要处理的记录数量
还可以考虑将查询移动到FROM部分,并使用主查询
加入它