MySQL一些查询永远不会被执行

时间:2017-04-25 15:13:12

标签: mysql database performance data-science

我正处于我的第一个数据科学项目的中间,我在使用MySQL Workbench进行极慢的查询时遇到了一些麻烦。

这些是我的3个表(每个表都来自各个网站的数据集,这些数据集已经清理并插入到MySQL中):

CREATE TABLE IF NOT EXISTS `starbucks` (
   `STORE_NUMBER` varchar(20) NOT NULL,
   `CITY` varchar(50) NOT NULL,
   `STATE` char(2) NOT NULL,
   `ZIPCODE` char(5) NOT NULL,
   `LONG` varchar(10) NOT NULL,
   `LAT` varchar(10) NOT NULL,
   PRIMARY KEY (`STORE_NUMBER`)
   )ENGINE=InnoDB")

CREATE TABLE IF NOT EXISTS `income`(
   `STATEFIPS` char(2) NOT NULL,
   `STATE` char(2) NOT NULL,
   `ZIPCODE` char(5) NOT NULL,
   `AGI_STUB` tinyint NOT NULL,
   `NUM_RETURNS` float(15,4) NOT NULL,
   `TOTAL_INCOME` float(15,4) NOT NULL,
   PRIMARY KEY (`STATE`, `ZIPCODE`, `AGI_STUB`)
   )ENGINE=InnoDB")

CREATE TABLE IF NOT EXISTS `diversity`(
   `COUNTY` varchar(50) NOT NULL,
   `STATE` char(2) NOT NULL,
   `INDEX` float(7,6) NOT NULL,
   `1` float(3,1) NOT NULL,
   `2` float(3,1) NOT NULL,
   `3` float(3,1) NOT NULL,
   `4` float(3,1) NOT NULL,
   `5` float(3,1) NOT NULL,
   `6` float(3,1) NOT NULL,
   `7` float(3,1) NOT NULL,
   PRIMARY KEY (`COUNTY`, `STATE`)
   )ENGINE=InnoDB")

starbucks有13,608条记录, income有166,740条记录, diversity有3,143条记录。

我正在尝试运行的查询:

SELECT  i.TOTAL_INCOME,
    CASE
        WHEN s.STORE_NUMBER IS NOT NULL THEN 1
        ELSE 0
    END AS has_starbucks
  FROM  income as i
  LEFT  OUTER JOIN starbucks as s
    ON  i.ZIPCODE = s.ZIPCODE

如果我将结果限制为1,000行,它会快速运行,但是我需要获取所有记录(没有行限制),这导致查询永远不会返回,并最终超时并断开我与MySQL的连接服务器。 过去,当他们的数据库中有数百万条记录的公司工作时,我从未遇到过这么多麻烦。

我需要做哪些表优化来解决这个问题?我需要更改哪些MySQL设置?欢迎任何其他建议。

修改 看来查询的“持续时间”永远不会超过0.500秒,它是“获取”部分持续> 120秒我不确定这是否是有用的信息。

2 个答案:

答案 0 :(得分:1)

第一个问题是在连接列上创建适当的索引

 CREATE INDEX idx1 ON starbucks (ZIPCODE );
 CREATE INDEX idx2 ON income (ZIPCODE );

或添加您选择的列的冗长索引

CREATE INDEX idx2 ON income (ZIPCODE , TOTAL_INCOME);

并使用解释计划检查行为

答案 1 :(得分:0)

这对解决您的性能问题没有太大作用,但会修复您的重复行问题 -

 SELECT  i.TOTAL_INCOME, 1 AS has_starbucks
 FROM  income as i
 WHERE i.Zipcode in (Select zipcode from Starbucks)
   UNION
 SELECT  i.TOTAL_INCOME, 0 AS has_starbucks
 FROM  income as i
 WHERE i.Zipcode not in (Select zipcode from Starbucks)

EXISTS有时比IN

更有效
 SELECT  i.TOTAL_INCOME, 1 AS has_starbucks
 FROM  income as i
 WHERE EXISTS 
 (   SELECT 1 
     FROM Starbucks s
     WHERE s.zipcode = i.Zipcode
 )
   UNION
 SELECT  i.TOTAL_INCOME, 0 AS has_starbucks
 FROM  income as i
 WHERE NOT EXISTS 
 (   SELECT 1 
     FROM Starbucks s
     WHERE s.zipcode = i.Zipcode
 )