MySQL加入大数据的最佳实践

时间:2012-01-06 08:19:52

标签: mysql join sharding

table1_shard1(每个分片1,000,000行x 120个分片)

 id_user   hash

table2(100,000行)

 value    hash

期望的输出:

 id_user  hash    value

我正在尝试找到从上表中将 id_user 相关联的最快方法。

我当前的查询运行了30个小时而没有结果。

SELECT 
    table1_shard1.id_user, table1_shard1.hash, table2.value 
FROM table1_shard1 
LEFT JOIN table2 ON table1_shard1.hash=table2.hash 
GROUP BY id_user
UNION
SELECT 
    table1_shard2.id_user, table1_shard2.hash, table2.value 
FROM table1_shard1 
LEFT JOIN table2 ON table1_shard2.hash=table2.hash 
GROUP BY id_user
UNION 
( ... )
UNION 
SELECT 
    table1_shard120.id_user, table1_shard120.hash, table2.value 
FROM table1_shard1 
LEFT JOIN table2 ON table1_shard120.hash=table2.hash 
GROUP BY id_user

1 个答案:

答案 0 :(得分:0)

首先,您在hash字段

上有索引吗?

我认为你应该在查询之前合并你的表(至少暂时)

CREATE TEMPORARY TABLE IF NOT EXISTS tmp_shards
SELECT * FROM table1_shard1;

CREATE TEMPORARY TABLE IF NOT EXISTS tmp_shards
SELECT * FROM table1_shard2;

# ...

然后执行主查询

SELECT
  table1_shard120.id_user
, table1_shard120.hash
, table2.value
FROM tmp_shards AS shd
LEFT JOIN table2 AS tb2 ON (shd.hash = tb2.hash)
GROUP BY id_user
;

不确定性能增益,但它至少会更易于维护。