我正在构建一个查询,根据voters
表中的活动(700万条记录)列出来自votes
表(100万条记录)的选民。标准如下:
大选(GE)每年只发生一次,只计算2004年或之后的通用电气。
在前面提到的通用电气中,只有10%到50%选民投票的通用电气才应计算在内。
一些不太重要的信息:
无法更改架构。它作为固定宽度的文本文件呈现给我们,通过脚本上传,并用于其他目的。
只有当前活跃选民名单及其投票历史记录可用。在我的下面的查询中,我已经包含了一个方程式,每当年度减少1时,将上限阈值减少10,000个。这并不完美,但它似乎过滤掉了不需要的GE,同时保留了有效的GE。
< / LI>例如,如果在2005年,2006年,2007年,2009年,2010年和2011年投票的投票人数在10万到50万之间,那么我只希望选出那些年投票的选民。 < / p>
模式如下:
CREATE TABLE IF NOT EXISTS `voters` (
`CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
`LastName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`FirstName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`MiddleInitial` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
`NameSuffix` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`HouseNumber` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`HouseNumberSuffix` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`ApartmentNumber` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
`StreetName` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`City` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
`Zip` varchar(5) COLLATE utf8_unicode_ci NOT NULL,
`ZipCode4` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress1` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress2` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress3` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress4` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`DOBY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`DOBM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`DOBD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`Gender` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
`Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`Other` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`CD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`CO` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`SD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`CC` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`JD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`RegY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`RegM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`RegD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`Status` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
`StatusChangeY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`StatusChangeM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`StatusChangeD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`LastVoted` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`Telephone` varchar(12) COLLATE utf8_unicode_ci NOT NULL,
`County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
KEY `districts` (`CountyEMSID`,`ED`,`AD`,`CD`,`CO`,`SD`,`CC`,`JD`),
KEY `vsn` (`CountyEMSID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `votes` (
`CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
`County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`ElectionDateY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`ElectionDateM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`ElectionDateD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`ElectionType` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
KEY `CountyEMSID` (`CountyEMSID`),
KEY `perfect` (`CountyEMSID`,`ElectionDateY`,`ElectionType`),
KEY `CountyEMSID_2` (`CountyEMSID`,`ElectionDateY`,`ElectionType`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
到目前为止,我有以下查询,它应该只列出votes
表中选民的唯一ID(CountyEMSID)。它适用于mysqlfiddle,但挂在phpmyadmin。
SELECT DISTINCT CountyEMSID
FROM `votes`
WHERE ElectionDateY IN
(
SELECT ElectionDateY
FROM `votes`
WHERE ElectionType = 'GE'
AND ElectionDateY >= 2004
GROUP BY ElectionDateY
HAVING COUNT(*) < ((0.5 * (SELECT COUNT(*) FROM `voters`)) - ((YEAR(CURRENT_TIMESTAMP()) - ElectionDateY) * 10000))
AND COUNT(*) > (0.1 * (SELECT COUNT(*) FROM `voters`))
)
我非常感谢您优化此查询并修改它以便它从votes
表中返回所有相应的选民信息。
答案 0 :(得分:2)
MySQL优化in
条款非常糟糕。基本上,它为每个处理的行重新运行子查询。您应该将计算移动到from
子句中。这是我的尝试:
select distinct v.*
from votes v join
(select electiondatey, count(*) as NumYVotes
from votes v
group by electiondatey
) ey
on v.electiondatey = ev.electiondatey cross join
(select count(*) as numvoters from voters) as const
where (NumYVotes < 0.5 * numvoters - year(now()) - ElectionDateY * 10000) and
(NumYVotes > 0.1 * numvoters)
注意:我没有对此进行测试,因此它可能存在语法错误。