我正处于我的第一个数据科学项目的中间,我在使用MySQL Workbench进行极慢的查询时遇到了一些麻烦。
这些是我的3个表(每个表都来自各个网站的数据集,这些数据集已经清理并插入到MySQL中):
CREATE TABLE IF NOT EXISTS `starbucks` (
`STORE_NUMBER` varchar(20) NOT NULL,
`CITY` varchar(50) NOT NULL,
`STATE` char(2) NOT NULL,
`ZIPCODE` char(5) NOT NULL,
`LONG` varchar(10) NOT NULL,
`LAT` varchar(10) NOT NULL,
PRIMARY KEY (`STORE_NUMBER`)
)ENGINE=InnoDB")
CREATE TABLE IF NOT EXISTS `income`(
`STATEFIPS` char(2) NOT NULL,
`STATE` char(2) NOT NULL,
`ZIPCODE` char(5) NOT NULL,
`AGI_STUB` tinyint NOT NULL,
`NUM_RETURNS` float(15,4) NOT NULL,
`TOTAL_INCOME` float(15,4) NOT NULL,
PRIMARY KEY (`STATE`, `ZIPCODE`, `AGI_STUB`)
)ENGINE=InnoDB")
CREATE TABLE IF NOT EXISTS `diversity`(
`COUNTY` varchar(50) NOT NULL,
`STATE` char(2) NOT NULL,
`INDEX` float(7,6) NOT NULL,
`1` float(3,1) NOT NULL,
`2` float(3,1) NOT NULL,
`3` float(3,1) NOT NULL,
`4` float(3,1) NOT NULL,
`5` float(3,1) NOT NULL,
`6` float(3,1) NOT NULL,
`7` float(3,1) NOT NULL,
PRIMARY KEY (`COUNTY`, `STATE`)
)ENGINE=InnoDB")
starbucks
有13,608条记录,
income
有166,740条记录,
diversity
有3,143条记录。
我正在尝试运行的查询:
SELECT i.TOTAL_INCOME,
CASE
WHEN s.STORE_NUMBER IS NOT NULL THEN 1
ELSE 0
END AS has_starbucks
FROM income as i
LEFT OUTER JOIN starbucks as s
ON i.ZIPCODE = s.ZIPCODE
如果我将结果限制为1,000行,它会快速运行,但是我需要获取所有记录(没有行限制),这导致查询永远不会返回,并最终超时并断开我与MySQL的连接服务器。 过去,当他们的数据库中有数百万条记录的公司工作时,我从未遇到过这么多麻烦。
我需要做哪些表优化来解决这个问题?我需要更改哪些MySQL设置?欢迎任何其他建议。
修改 看来查询的“持续时间”永远不会超过0.500秒,它是“获取”部分持续> 120秒我不确定这是否是有用的信息。
答案 0 :(得分:1)
第一个问题是在连接列上创建适当的索引
CREATE INDEX idx1 ON starbucks (ZIPCODE );
CREATE INDEX idx2 ON income (ZIPCODE );
或添加您选择的列的冗长索引
CREATE INDEX idx2 ON income (ZIPCODE , TOTAL_INCOME);
并使用解释计划检查行为
答案 1 :(得分:0)
这对解决您的性能问题没有太大作用,但会修复您的重复行问题 -
SELECT i.TOTAL_INCOME, 1 AS has_starbucks
FROM income as i
WHERE i.Zipcode in (Select zipcode from Starbucks)
UNION
SELECT i.TOTAL_INCOME, 0 AS has_starbucks
FROM income as i
WHERE i.Zipcode not in (Select zipcode from Starbucks)
EXISTS有时比IN
更有效 SELECT i.TOTAL_INCOME, 1 AS has_starbucks
FROM income as i
WHERE EXISTS
( SELECT 1
FROM Starbucks s
WHERE s.zipcode = i.Zipcode
)
UNION
SELECT i.TOTAL_INCOME, 0 AS has_starbucks
FROM income as i
WHERE NOT EXISTS
( SELECT 1
FROM Starbucks s
WHERE s.zipcode = i.Zipcode
)