尝试优化通过投票活动列出选民的查询

时间:2013-01-27 19:56:06

标签: mysql optimization join subquery

我正在构建一个查询,根据voters表中的活动(700万条记录)列出来自votes表(100万条记录)的选民。标准如下:

  • 大选(GE)每年只发生一次,只计算2004年或之后的通用电气。

  • 在前面提到的通用电气中,只有10%到50%选民投票的通用电气才应计算在内。

一些不太重要的信息:

  • 无法更改架构。它作为固定宽度的文本文件呈现给我们,通过脚本上传,并用于其他目的。

  • 只有当前活跃选民名单及其投票历史记录可用。在我的下面的查询中,我已经包含了一个方程式,每当年度减少1时,将上限阈值减少10,000个。这并不完美,但它似乎过滤掉了不需要的GE,同时保留了有效的GE。

    < / LI>

例如,如果在2005年,2006年,2007年,2009年,2010年和2011年投票的投票人数在10万到50万之间,那么我只希望选出那些年投票的选民。 < / p>

mysqlfiddle is here

模式如下:

CREATE TABLE IF NOT EXISTS `voters` (
  `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
  `LastName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `FirstName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `MiddleInitial` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `NameSuffix` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `HouseNumber` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `HouseNumberSuffix` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `ApartmentNumber` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
  `StreetName` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `City` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
  `Zip` varchar(5) COLLATE utf8_unicode_ci NOT NULL,
  `ZipCode4` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress1` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress2` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress3` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress4` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `DOBY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `DOBM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `DOBD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `Gender` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `Other` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CO` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `SD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CC` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `JD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `RegY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `RegM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `RegD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `Status` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `LastVoted` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `Telephone` varchar(12) COLLATE utf8_unicode_ci NOT NULL,
  `County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  KEY `districts` (`CountyEMSID`,`ED`,`AD`,`CD`,`CO`,`SD`,`CC`,`JD`),
  KEY `vsn` (`CountyEMSID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `votes` (
  `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
  `County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionType` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  KEY `CountyEMSID` (`CountyEMSID`),
  KEY `perfect` (`CountyEMSID`,`ElectionDateY`,`ElectionType`),
  KEY `CountyEMSID_2` (`CountyEMSID`,`ElectionDateY`,`ElectionType`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

到目前为止,我有以下查询,它应该只列出votes表中选民的唯一ID(CountyEMSID)。它适用于mysqlfiddle,但挂在phpmyadmin。

SELECT DISTINCT CountyEMSID
FROM `votes` 
WHERE ElectionDateY IN 
(
SELECT ElectionDateY
FROM `votes`
WHERE ElectionType = 'GE' 
AND ElectionDateY >= 2004 
GROUP BY ElectionDateY 
HAVING COUNT(*) < ((0.5 * (SELECT COUNT(*) FROM `voters`)) - ((YEAR(CURRENT_TIMESTAMP()) - ElectionDateY) * 10000)) 
AND COUNT(*) > (0.1 * (SELECT COUNT(*) FROM `voters`))
)

我非常感谢您优化此查询并修改它以便它从votes表中返回所有相应的选民信息。

1 个答案:

答案 0 :(得分:2)

MySQL优化in条款非常糟糕。基本上,它为每个处理的行重新运行子查询。您应该将计算移动到from子句中。这是我的尝试:

select distinct v.*
from votes v join
     (select electiondatey, count(*) as NumYVotes
      from votes v
      group by electiondatey
    ) ey
    on v.electiondatey = ev.electiondatey cross join
    (select count(*) as numvoters from voters) as const
where (NumYVotes < 0.5 * numvoters - year(now()) - ElectionDateY * 10000) and
      (NumYVotes > 0.1 * numvoters)

注意:我没有对此进行测试,因此它可能存在语法错误。