我必须从我们的mysql数据库中导出554k条记录。以目前的速度,导出数据将需要5天,并且速度缓慢主要是由以下查询引起的。数据结构由
组成Companies
--Contacts
----(Contact)Activities
对于联系人,我们在company_id上有一个索引。在活动表上,我们有一个contact_id和company_id的索引,它们分别映射回相应的contact和company表。
我需要获取每个联系人及其最新的活动日期。这是我正在运行的查询,执行大约需要0.5秒。
Select *
from contacts
left outer join (select occurred_at
,contact_id
from activities
where occurred_at is not null
group by contact_id
order by occurred_at desc) activities
on contacts.id = activities.contact_id
where company_id = 20
如果我删除联接并仅从company_id = 20的联系人中选择*,则查询将在.016秒内执行。
如果我使用Explain来获取有关联接查询的信息,则会得到此信息
关于如何加快速度的任何想法?
编辑: 这是表的定义。
CREATE TABLE `companies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`street_address` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`state` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`county` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`website` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`falloff_date` date DEFAULT NULL,
`zipcode` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`order_count` int(11) NOT NULL DEFAULT '0',
`active_job_count` int(11) NOT NULL DEFAULT '0',
`duplicate_of` int(11) DEFAULT NULL,
`warm_date` datetime DEFAULT NULL,
`employee_size` int(11) DEFAULT NULL,
`dup_checked` tinyint(1) DEFAULT '0',
`rating` int(11) DEFAULT NULL,
`delinquent` tinyint(1) DEFAULT '0',
`cconly` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `index_companies_on_name` (`name`),
KEY `index_companies_on_user_id` (`user_id`),
KEY `index_companies_on_company_id` (`company_id`),
KEY `index_companies_on_external_id` (`external_id`),
KEY `index_companies_on_state_and_dup_checked` (`id`,`state`,`dup_checked`,`duplicate_of`),
KEY `index_companies_on_dup_checked` (`id`,`dup_checked`),
KEY `index_companies_on_dup_checked_name` (`dup_checked`,`name`),
KEY `index_companies_on_county` (`county`,`state`)
) ENGINE=InnoDB AUTO_INCREMENT=15190300 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `contacts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`last_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`extension` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`fax` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`active` tinyint(1) DEFAULT NULL,
`main` tinyint(1) DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`second_phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_contacts_on_company_id` (`company_id`),
KEY `index_contacts_on_first_name` (`first_name`),
KEY `index_contacts_on_last_name` (`last_name`),
KEY `index_contacts_on_phone` (`phone`),
KEY `index_contacts_on_email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=11241088 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `activities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`kind` int(11) DEFAULT NULL,
`contact_id` int(11) DEFAULT NULL,
`call_status` int(11) DEFAULT NULL,
`occurred_at` datetime DEFAULT NULL,
`notes` text COLLATE utf8_unicode_ci,
`user_id` int(11) DEFAULT NULL,
`scheduled_for` datetime DEFAULT NULL,
`priority` tinyint(1) DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`from_user_id` int(11) DEFAULT NULL,
`to_user_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_activities_on_contact_id` (`contact_id`),
KEY `index_activities_on_user_id` (`user_id`),
KEY `index_activities_on_company_id` (`company_id`)
) ENGINE=InnoDB AUTO_INCREMENT=515340 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
答案 0 :(得分:3)
这是一个greatest-n-per-group查询,经常在堆栈溢出中出现。
以下是使用MySQL 8.0窗口函数的解决方案:
contact_id
以下是适用于8.0之前版本的解决方案:
WITH latest_activities AS (
SELECT contact_id, occurred_at,
ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY occurred_at DESC) AS rn
FROM activities
)
SELECT *
FROM contacts AS c
LEFT OUTER JOIN latest_activities
ON c.id = latest_activities.contact_id AND latest_activities.rn = 1
WHERE c.company_id = 20
另一种解决方案:
SELECT c.*, a.*
FROM contacts AS c
LEFT OUTER JOIN activities AS a ON a.contact_id = c.id
LEFT OUTER JOIN activities AS a2 ON a2.contact_id = c.id
AND a2.occurred_at > a.occurred_at
WHERE c.company_id = 20
AND a2.contact_id IS NULL;
在活动(contact_id,发生的事件)上创建新索引将很有帮助。
答案 1 :(得分:0)
如果可以帮助,请勿在{{1}}子句中使用子查询。它们阻碍了MySQL优化器。因此,如果您要一行:
FROM
如果您想每Select c.*, a.occurred_at
from contacts c left outer join
from activities a
on c.id = a.contact_id and
a.occurred_at is not null
where c.company_id = 20
order by a.occurred_at desc
limit 1;
行:
contact_id
这可以利用Select c.*, a.occurred_at
from contacts c left outer join
from activities a
on c.id = a.contact_id and
a.occurred_at is not null and
a.occurred_at = (select max(a2.occurred_at)
from activities a2
where a2.contact_id = a.contact_id
)
where c.company_id = 20
order by a.occurred_at desc
limit 1;
上的索引。和activities(contact_id, occured_at)
。
您的查询正在做一件很明显的事情,即不做不做-并且最新版本的MySQL中的默认设置不再支持该操作。您在contact(company_id, contact_id)
中有未聚合的列,但不在select
中。 group by
应该会产生错误。
答案 2 :(得分:0)
我觉得我正在忽略其他答案的复杂性,但是我认为这就是您所需要的。
SELECT c.*
, MAX(a.occurred_at) AS occurred_at
FROM contacts AS c
LEFT JOIN activities AS a
ON c.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE c.company_id = 20
GROUP BY c.id;
注意:(1)假设您实际上并不希望原始子查询中的重复contact_id出现在最终结果中。 (2)这还假设您的服务器未配置为要求完整的分组依据;如果是这样,您将需要手动将c.*
展开到完整的列列表中,并将该列表也复制到GROUP BY
子句中。
扩大dnoeth对您问题的评论;如果您不是出于特定原因而不是分别查询每个公司(分担负载,则代码结构处理也可以逐个处理其他公司) ,您可以对上述查询进行调整,以获取全部信息您的结果只需一次查询。
SELECT con.*
, MAX(a.occurred_at) AS occurred_at
FROM companies AS com
INNER JOIN contacts AS con ON com.id = con.company_id
LEFT JOIN activities AS a
ON con.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE [criteria for companies chosen to be queried]
GROUP BY con.id
ORDER BY con.company_id, con.id
;