在Join上查询大型数据集(超过1500万行)

时间:2016-08-24 16:41:46

标签: mysql mariadb

我正在尝试加入两个表格productsproducts_markets。虽然products的记录不足一百万,但product_markets更接近2000万条记录。数据已更改,因此架构创建表中可能存在拼写错误或两个错误:

CREATE TABLE `products_markets` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(10) unsigned NOT NULL,
  `country_code_id` int(10) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_index` (`product_id`,`country_code_id`)
) ENGINE=InnoDB AUTO_INCREMENT=21052102 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `products` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `manufacturer_id` int(10) unsigned NOT NULL,
  `department_id` int(10) unsigned NOT NULL,
  `code` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
  `popularity` int(11) DEFAULT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `value` bigint(20) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `products_code_unique` (`code`),
  KEY `products_department_id_foreign` (`department_id`),
  KEY `products_manufacturer_id_foreign` (`manufacturer_id`),
  CONSTRAINT `products_department_id_foreign`
       FOREIGN KEY (`department_id`) REFERENCES `departments` (`id`),
  CONSTRAINT `products_manufacturer_id_foreign`
       FOREIGN KEY (`manufacturer_id`) REFERENCES `manufacturers` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=731563 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

我正在尝试返回50个特定国家/地区最受欢迎产品的记录,而且我的时间约为50秒,这似乎高于预期。

我尝试过几个不同的查询但没有成功:

select  `products_markets`.`product_id`
    from  products_markets
    left join  
        ( SELECT  products.id, products.popularity
            from  products
        ) p  ON p.id = products_markets.product_id
    where products_markets.country_code_id = 121
    order by  `popularity` desc, `p`.`id` asc
    limit  50 

select  `products`.*
    from  `products`
    where  products.id in (
        SELECT  product_id
            from  products_markets
            where  products_markets.country_code_id = 121
                          )
    group by  `products`.`name`, `products`.`manufacturer_id`
    order by  `popularity` desc, `products`.`id` asc
    limit  50 

这个查询的解释是:

id  select_type  table              type possible_keys key           key_len refs             rows              extra
1   PRIMARY      products           ALL  PRIMARY       NULL          NULL    NULL             623848            Using temporary; Using filesort
1   PRIMARY      products_markets   ref  unique_index  unique_index  4       main.products.id 14                Using where; Using index; FirstMatch(products)

我感兴趣的一个选项是将products_markets拆分为每个国家/地区的单独表格以减少查询。我尝试在服务器上添加更多内存但没有太大成功。任何人都可以识别数据库设计/查询的任何明显错误吗?

还有哪些其他选项可以使此查询成为当前~50秒的一小部分?

2 个答案:

答案 0 :(得分:1)

摆脱id中的products_markets并添加

PRIMARY KEY(country_code_id, product_id)

然后除去UNIQUE密钥,除非某些其他查询需要它。

这将显着缩小该大型表的磁盘占用空间,从而可能加快触及它的所有查询。

这将有助于Hamaza建议的重新制定。

答案 1 :(得分:0)

尝试使用此查询的含义,您首先从products_market表中选择指定国家/地区的所有产品,而不是按产品种类从产品表中选择这些产品,并将其限制为50.尝试不要执行产品。 *并仅选择那些需要数据的字段。

select  products_markets.product_id, products_markets.county_code_id,
        products.*
    from  products_markets,products
    where  products_markets.country_code_id = 121
      and  products_markets.product_id=products.id
    group by  `products`.`name`, `products`.`manufacturer_id`
    order by  `products_markets.popularity` desc, `products`.`id` asc
    limit  50