MySql选择查询。 > 4M行响应3分钟

时间:2014-07-26 10:11:33

标签: mysql performance

同意在select查询中有一些数学计算,但肯定不会以这种方式影响性能。

以下是选择查询。

SELECT `p`.`id` as post_id, `p`.`description` as description, `p`.`rent` as rent, 
`p`.`created_at` as created_at, `p`.`title` as title, 
UNIX_TIMESTAMP(p.created_at) as timestamp,
`p`.`user_id` as post_user_id, `p`.`bathrooms`, `p`.`bedrooms`, `p`.`created_at`, 
`p`.`address`, `p`.`lat`, `p`.`lng`, `p`.`posted_by`, `p`.`amenities`, `p`.`user_id`, 
`p`.`smoking_policy`, `p`.`sqft`, `p`.`dogs`, `p`.`cats`, `p`.`dwelling_type`,
`p`.`deposit`, 
`p`.`furnished`, `p`.`sublease`, `p`.`sublease_duration`, `p`.`lease`,          
`p`.`property_type`,`p`.`source`, `p`.`images_json`, `sub`.`name` as sub_category_name,   
`sub`.`id` as sub_category_id, `sub`.`text` as sub_category_text, `p`.`lat` as lat, 
`p`.`lng` as lng, `p`.`phone` as phone, 
(3959 * acos( cos( radians(42.3584308) ) * cos( radians( p.lat ) ) * cos( radians(  
 p.lng ) - radians(-71.0597732) ) + sin( radians(42.3584308) ) * sin( radians( p.lat ) 
) ) ) AS distance
FROM (`T1` p)
JOIN `sub_categories` as sub ON `sub`.`id` = `p`.`sub_category_id`
AND `p`.`lng` between (-71.0597732 - 20/abs(cos(radians(42.3584308 ))*69)) 
and (-71.0597732 + 20/abs(cos(radians(42.3584308))*69)) 
AND `p`.`lat` between 42.3584308 - (20/69) and 42.3584308 + (20/69)
AND `rent` <= '9200'
AND `rent` >= '7000'
AND `bedrooms` <= '4'
AND `bathrooms` <= '3'
AND `dogs` =  '1'
AND `p`.`sub_category_id` =  '2'
HAVING `distance` <= '100'
ORDER BY `p`.`created_at` desc
LIMIT 0,12;

搜索应提供输入地址(纬度,长坐标)外围的可用列表。

AND条件参数(租金,卧室等......)和相关值是根据前端选择动态分配的。

表结构就在这里。

CREATE TABLE `T1` (
`id` varchar(40) NOT NULL DEFAULT '',`user_id` varchar(100) NOT NULL DEFAULT '',
`sub_category_id` bigint(20) NOT NULL,
`description` text,`title` text,
`rent` int(11) DEFAULT NULL,
`utilities` int(11) DEFAULT NULL,
`bathrooms` float DEFAULT NULL,
`bedrooms` int(11) DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`address` varchar(100) DEFAULT NULL,
`lat` double DEFAULT NULL,`lng` doubleDEFAULT NULL,
`dwelling_type` varchar(40) DEFAULT NULL,
`furnished` varchar(20) DEFAULT NULL,
`lease_transfer_fees` int(10) DEFAULT NULL,
`dogs` int(11) DEFAULT NULL,
`cats` int(11) DEFAULT NULL,
`parking_spots` int(10) DEFAULT NULL,
`smoking_policy` varchar(5) DEFAULT NULL,
`deposit` varchar(20) DEFAULT NULL,
`sqft` bigint(20) DEFAULT NULL,
`posted_by` varchar(20) DEFAULT NULL,
`amenities` varchar(500) DEFAULT NULL,
`sublease` varchar(20) DEFAULT NULL,
`sublease_duration` int(11) DEFAULT NULL,
`lease` varchar(20) DEFAULT NULL,
`external_id` varchar(40) DEFAULT NULL,
`source` varchar(10) DEFAULT 'np',
`anchor` varchar(40) DEFAULT NULL,
`property_type` varchar(40) DEFAULT NULL,
`deleted` tinyint(1) DEFAULT '0',
`images_json` text,
`phone` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id_index` (`user_id`),
KEY `filter_combined_index` (`created_at`,`lat`,`lng`,`sub_category_id`,`rent`,     
`bedrooms`,`bathrooms`,`sqft`,`dogs`,`cats`),
KEY `sub_category_id` (`sub_category_id`),
FULLTEXT KEY `text_search_index`    
(`title`,`description`,`smoking_policy`,`posted_by`,`dwelling_type`)
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

解释声明结果如下。

id  select_type table   type    possible_keys   key             key_len    ref     rows      Extra
1   SIMPLE      sub     const   PRIMARY,id      PRIMARY         8          const    1      Using filesort
1   SIMPLE      p       ref     sub_category_id sub_category_id 8          const    188122  Using where

表结构效率不高还是选择查询或两者兼而有之?

当然,4米行不应成为限制因素。提前感谢驻地专家的建议。

TA!

3 个答案:

答案 0 :(得分:0)

您基本上计算表格中存在的每一行的距离:

SELECT [...]
(3959 * acos( cos( radians(42.3584308) ) * cos( radians( p.lat ) ) *
cos( radians(    p.lng ) - radians(-71.0597732) ) + sin(
radians(42.3584308) ) * sin( radians( p.lat )  ) ) ) AS distance
[...]
HAVING `distance` <= '100'

这迫使MySQL在每个查询中读取完整的表格。

此外,包含坐标的唯一索引无法用于搜索,因为它以created_at开头:

KEY `filter_combined_index` (`created_at`,`lat`,`lng`,`sub_category_id`,`rent`,
`bedrooms`,`bathrooms`,`sqft`,`dogs`,`cats`),

您可以尝试使用适当的索引进行简单的每坐标搜索:

WHERE lat BETWEEN :lat_from AND :lat_to
and lng BETWEEN :lng_from AND :lng_to

...其中from和两个值属于边界方块。一旦确定了潜在的匹配项,就可以使用实际的圆圈对结果进行微调。

答案 1 :(得分:0)

您要求数据库在这里做很多工作,但我还建议稍微重构您的查询?

首先,您正在使用联接,但您没有明确使用WHERE子句,因此您实际上指定了一个非常大的JOIN条件。在内部,MySql很有可能会自动发现它实际上只是一个WHERE子句,但是假设该连接中没有任何行似乎对连接本身有任何影响,那么它们应该在它们自己的{ {1}}。这可能会产生很大的不同,因为理论上它会在进行连接之前减少行数。

其次,您使用的是WHERE子句,但查询中没有任何聚合。一般的经验法则是HAVING子句用于聚合(例如COUNT或AVG),HAVING子句用于其他任何地方。

由于@Joachim和@LHristov都接触过,在查询时进行这些计算可能不是一个好主意。您已经要求提供大量数据,但现在要求它为它找到的每一行运行计算,然后为连接运行单独的计算。不幸的是你已经说过你不能坚持这个以致无法解决,但是如果以下变化没有,@Álvaro的建议可能会有所改善

重新构建查询以使用WHERE代替WHERE并删除拥有。我希望结果查询看起来像下面的

JOIN

p.lng) - 弧度(-71.0597732))+ sin(弧度(42.3584308))* sin(弧度(p.lat) )))AS距离     FROM(SELECT `p`.`id` as post_id, , `p`.`description` as description, , `p`.`rent` as rent , `p`.`created_at` as created_at , `p`.`title` as title , UNIX_TIMESTAMP(p.created_at) as timestamp , `p`.`user_id` as post_user_id , `p`.`bathrooms` , `p`.`bedrooms` , `p`.`created_at` , `p`.`address` , `p`.`lat` , `p`.`lng` , `p`.`posted_by` , `p`.`amenities` , `p`.`user_id`, , `p`.`smoking_policy` , `p`.`sqft` , `p`.`dogs` , `p`.`cats` , `p`.`dwelling_type` , `p`.`deposit` , `p`.`furnished` , `p`.`sublease` , `p`.`sublease_duration` , `p`.`lease` , `p`.`property_type` , `p`.`source` , `p`.`images_json` , `sub`.`name` as sub_category_name , `sub`.`id` as sub_category_id , `sub`.`text` as sub_category_text , `p`.`lat` as lat , `p`.`lng` as lng , `p`.`phone` as phone, , (3959 * acos( cos( radians(42.3584308) ) * cos( radians( p.lat ) ) * cos( radians( p)     加入T1作为子sub_categoriessub = idpsub_category_idp ='2'     WHERE(sub_category_idp BETWEEN(-71.0597732 - 20 / abs(cos(弧度(42.3584308))* 69))AND(-71.0597732 + 20 / abs(cos(弧度(42.3584308))* 69)))     AND(lngp BETWEEN(42.3584308 - (20/69))和(42.3584308 +(20/69)))     和lat&lt; = 9200     AND rent&gt; = 7000     AND rent&lt; = 4     AND bedrooms&lt; = 3     AND bathrooms = 1     AND dogs&lt; = 100     ORDER BY distancep desc     限制0,12;

希望如前所述,这将导致执行计算的行数在进行任何计算和created_at之前显着减少,而只有JOIN可能会导致它在检查它们是否加入之前,通过计算返回所有行。你可以想象的慢得多

我还注意到您多次选择了一些列,例如JOINcreated_at?不确定是否有意,但可以产生细微差别。

此外,作为整数的user_idbedroomsrent等字段的where子句条件都被比较,就好像它们是字符串一样?我在上面的查询中已经改变了。

答案 2 :(得分:0)

这是一个评论,利用了答案窗口格式选项。

FWIW,我发现这更容易阅读...我将距离作为一个函数绑定......

SELECT p.id post_id
     , p.description 
     , p.rent 
     , p.title
     , p.user_id post_user_id
     , p.bathrooms
     , p.bedrooms
     , p.created_at
     , p.address
     , p.posted_by
     , p.amenities
     , p.smoking_policy
     , p.sqft
     , p.dogs
     , p.cats
     , p.dwelling_type
     , p.deposit
     , p.furnished
     , p.sublease
     , p.sublease_duration
     , p.lease
     , p.property_type
     , p.source
     , p.images_json
     , sub.name sub_category_name
     , sub.id sub_category_id
     , sub.text sub_category_text
     , p.lat 
     , p.lng 
     , p.phone 
     , my_distance_function(p.lat,p.lng,71.0597732,42.3584308) distance
  FROM T1 p
  JOIN sub_categories sub
    ON sub.id = p.sub_category_id     
 WHERE my_distance_function(p.lat,p.lng,71.0597732,42.3584308) <= 100
   AND p.lng BETWEEN -71.452028 AND -70.6675175
   AND p.lat BETWEEN 42.0685757 AND 42.6482859
   AND rent <= 9200     
   AND rent >= 7000
   AND bedrooms <= 4
   AND bathrooms <= 3
   AND dogs =  1
   AND p.sub_category_id = 2

 ORDER 
    BY p.created_at DESC
 LIMIT 0,12;