mysql查询效率从类似组中提取不同的记录

时间:2018-01-12 21:08:29

标签: mysql

我正在构建一个表格,其中显示多个记录包含相同btc的实例,但对于不同的customer_names,还显示cost的最低实例每个客户。

此查询有效 - 但效率非常低,在80,000行表上运行需要一分多钟 - 所以我觉得我一定做错了。

select btc,customer_name,min(cost) from table where table.btc in
 (select btc from table group by 1 having count(distinct customer_name) > 1) 
 group by 1,2

这会输出如下表格:

+---------+---------------+---------+
|   btc   | customer_name |  cost   |
+---------+---------------+---------+
| asd32   | Sony          | 1.45863 |
| asd32   | Nintendo      | 1.84839 |
| bf33940 | Sony          | 2.49188 |
| bf33940 | Nintendo      | 2.49188 |
| a43c3f  | Sony          | 2.84142 |
| a43c3f  | Nintendo      | 2.45    |
| a43c3f  | Sega          | 2.689   |
+---------+---------------+---------+

我希望更进一步,不要包含cost两个customer_name字段相同的任何结果,(所以 - 从中​​移除btc bf33940上面的表格,索尼和任天堂有相同的成本)

我也想知道是否有更有效的方法来做我正在做的事情。

表结构

+------------------+--------------+------+-----+---------+
|      field       |     type     | null | key | default |
+------------------+--------------+------+-----+---------+
| btc              | varchar(100) | NO   | MUL | NULL    |
| mpn              | varchar(100) | YES  |     | NULL    |
| supplier         | varchar(100) | YES  |     | NULL    |
| invoice          | varchar(100) | YES  |     | NULL    |
| invoice_date     | datetime     | YES  |     | NULL    |
| qtr              | varchar(5)   | YES  |     | NULL    |
| qty              | double(10,0) | YES  |     | NULL    |
| resale           | double(15,5) | YES  |     | NULL    |
| ext_resale       | double(15,5) | YES  |     | NULL    |
| cost             | double(15,5) | YES  |     | NULL    |
| ext_cost         | double(15,5) | YES  |     | NULL    |
| gpp              | double(15,5) | YES  |     | NULL    |
| project          | varchar(100) | YES  |     | NULL    |
| team             | double(15,5) | YES  |     | NULL    |
| build_type       | varchar(50)  | YES  |     | NULL    |
| customer_name    | varchar(100) | YES  |     | NULL    |
| customer_address | varchar(100) | YES  |     | NULL    |
| customer_type    | varchar(100) | YES  |     | NULL    |
| customer_group   | varchar(100) | YES  |     | NULL    |
| sps              | varchar(100) | YES  |     | NULL    |
| fps              | varchar(100) | YES  |     | NULL    |
| gps              | varchar(100) | YES  |     | NULL    |
| hps              | varchar(100) | YES  |     | NULL    |
+------------------+--------------+------+-----+---------+

此处的示例CSV文件:https://ufile.io/os0as

1 个答案:

答案 0 :(得分:1)

您可以尝试将where...in替换为join,但很难说 没有测试它会有多高效。

这样的事情:

select t1.btc, customer_name, min(cost)
from xxx t1
join (
  select btc
  from xxx
  group by btc
  having count(*) > 1
) t2 on t1.btc = t2.btc
group by t1.btc, t1.customer_name

对于您的第二个问题,您可以进一步按btc和费用分组以删除重复项:

select t3.btc, group_concat(t3.customer_name), min_cost
from (
   select t1.btc, t1.customer_name, min(cost) as min_cost
   from xxx t1
   join (
      select btc
      from xxx
      group by btc
      having count(distinct customer_name) > 1
   ) t2 on t1.btc = t2.btc
) t3
group by t1.btc, t1.cost

同样,很难说如果没有测试就能开始工作,但希望你能得到这个想法。

为了加快速度,我会为每个btc创建一个单独的表,并计算出有多少客户拥有它的计数器,因此您不需要创建具有count()>的临时表。 1。