Question

我必须从我们的mysql数据库中导出554k条记录。以目前的速度，导出数据将需要5天，并且速度缓慢主要是由以下查询引起的。数据结构由

组成

Companies
--Contacts
----(Contact)Activities

对于联系人，我们在company_id上有一个索引。在活动表上，我们有一个contact_id和company_id的索引，它们分别映射回相应的contact和company表。

我需要获取每个联系人及其最新的活动日期。这是我正在运行的查询，执行大约需要0.5秒。

Select * 
from contacts 
left outer join (select  occurred_at
                        ,contact_id 
                 from activities 
                 where occurred_at is not null 
                 group by contact_id 
                 order by occurred_at desc) activities 
on contacts.id = activities.contact_id 
where company_id = 20

如果我删除联接并仅从company_id = 20的联系人中选择*，则查询将在.016秒内执行。

如果我使用Explain来获取有关联接查询的信息，则会得到此信息

关于如何加快速度的任何想法？

编辑：这是表的定义。

CREATE TABLE `companies` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `street_address` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `state` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `county` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `website` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `external_id` int(11) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `user_id` int(11) DEFAULT NULL,
  `falloff_date` date DEFAULT NULL,
  `zipcode` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `company_id` int(11) DEFAULT NULL,
  `order_count` int(11) NOT NULL DEFAULT '0',
  `active_job_count` int(11) NOT NULL DEFAULT '0',
  `duplicate_of` int(11) DEFAULT NULL,
  `warm_date` datetime DEFAULT NULL,
  `employee_size` int(11) DEFAULT NULL,
  `dup_checked` tinyint(1) DEFAULT '0',
  `rating` int(11) DEFAULT NULL,
  `delinquent` tinyint(1) DEFAULT '0',
  `cconly` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `index_companies_on_name` (`name`),
  KEY `index_companies_on_user_id` (`user_id`),
  KEY `index_companies_on_company_id` (`company_id`),
  KEY `index_companies_on_external_id` (`external_id`),
  KEY `index_companies_on_state_and_dup_checked` (`id`,`state`,`dup_checked`,`duplicate_of`),
  KEY `index_companies_on_dup_checked` (`id`,`dup_checked`),
  KEY `index_companies_on_dup_checked_name` (`dup_checked`,`name`),
  KEY `index_companies_on_county` (`county`,`state`)
) ENGINE=InnoDB AUTO_INCREMENT=15190300 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


CREATE TABLE `contacts` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `first_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `last_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `extension` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `fax` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `active` tinyint(1) DEFAULT NULL,
  `main` tinyint(1) DEFAULT NULL,
  `company_id` int(11) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `external_id` int(11) DEFAULT NULL,
  `second_phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_contacts_on_company_id` (`company_id`),
  KEY `index_contacts_on_first_name` (`first_name`),
  KEY `index_contacts_on_last_name` (`last_name`),
  KEY `index_contacts_on_phone` (`phone`),
  KEY `index_contacts_on_email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=11241088 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


CREATE TABLE `activities` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `kind` int(11) DEFAULT NULL,
  `contact_id` int(11) DEFAULT NULL,
  `call_status` int(11) DEFAULT NULL,
  `occurred_at` datetime DEFAULT NULL,
  `notes` text COLLATE utf8_unicode_ci,
  `user_id` int(11) DEFAULT NULL,
  `scheduled_for` datetime DEFAULT NULL,
  `priority` tinyint(1) DEFAULT NULL,
  `company_id` int(11) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `from_user_id` int(11) DEFAULT NULL,
  `to_user_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_activities_on_contact_id` (`contact_id`),
  KEY `index_activities_on_user_id` (`user_id`),
  KEY `index_activities_on_company_id` (`company_id`)
) ENGINE=InnoDB AUTO_INCREMENT=515340 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Answer 1

这是一个greatest-n-per-group查询，经常在堆栈溢出中出现。

以下是使用MySQL 8.0窗口函数的解决方案：

contact_id

以下是适用于8.0之前版本的解决方案：

WITH latest_activities AS (
  SELECT contact_id, occurred_at,
    ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY occurred_at DESC) AS rn
  FROM activities
)
SELECT *
FROM contacts AS c
LEFT OUTER JOIN latest_activities 
  ON c.id = latest_activities.contact_id AND latest_activities.rn = 1
WHERE c.company_id = 20

另一种解决方案：

SELECT c.*, a.*
FROM contacts AS c
LEFT OUTER JOIN activities AS a ON a.contact_id = c.id
LEFT OUTER JOIN activities AS a2 ON a2.contact_id = c.id 
  AND a2.occurred_at > a.occurred_at
WHERE c.company_id = 20
  AND a2.contact_id IS NULL;

在活动（contact_id，发生的事件）上创建新索引将很有帮助。

Answer 2

如果可以帮助，请勿在{{1}}子句中使用子查询。它们阻碍了MySQL优化器。因此，如果您要一行：

FROM

如果您想每Select c.*, a.occurred_at from contacts c left outer join from activities a on c.id = a.contact_id and a.occurred_at is not null where c.company_id = 20 order by a.occurred_at desc limit 1;行：

contact_id

这可以利用Select c.*, a.occurred_at from contacts c left outer join from activities a on c.id = a.contact_id and a.occurred_at is not null and a.occurred_at = (select max(a2.occurred_at) from activities a2 where a2.contact_id = a.contact_id ) where c.company_id = 20 order by a.occurred_at desc limit 1;上的索引。和activities(contact_id, occured_at)。

您的查询正在做一件很明显的事情，即不做不做-并且最新版本的MySQL中的默认设置不再支持该操作。您在contact(company_id, contact_id)中有未聚合的列，但不在select中。 group by应该会产生错误。

Answer 3

我觉得我正在忽略其他答案的复杂性，但是我认为这就是您所需要的。

SELECT c.*
   , MAX(a.occurred_at) AS occurred_at
FROM contacts AS c
LEFT JOIN activities AS a
   ON c.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE c.company_id = 20
GROUP BY c.id;

注意：（1）假设您实际上并不希望原始子查询中的重复contact_id出现在最终结果中。（2）这还假设您的服务器未配置为要求完整的分组依据；如果是这样，您将需要手动将c.*展开到完整的列列表中，并将该列表也复制到GROUP BY子句中。

扩大dnoeth对您问题的评论；如果您不是出于特定原因而不是分别查询每个公司（分担负载，则代码结构处理也可以逐个处理其他公司） ，您可以对上述查询进行调整，以获取全部信息您的结果只需一次查询。

SELECT con.*
   , MAX(a.occurred_at) AS occurred_at
FROM companies AS com 
INNER JOIN contacts AS con ON com.id = con.company_id
LEFT JOIN activities AS a
   ON con.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE [criteria for companies chosen to be queried]
GROUP BY con.id
ORDER BY con.company_id, con.id
;

使用联接的MySql查询速度很慢-如何加快速度

3 个答案: