MySQL从select distinct + union中获取重复项

时间:2012-09-12 09:55:01

标签: mysql

当我在MySQL中运行以下查询时,我得到了很多重复。我知道我已经非常清楚,我只需要不同的记录,所以我无法理解为什么它会为我加倍。当我包含最后一个联合(importorders表)时,似乎所有重复都会出现,因为大多数客户在客户和订单中具有相同的地址。任何人都可以帮助我理解为什么会这样吗?

SELECT DISTINCT PostalCode, City, Region, Country
FROM 
(select distinct postalcode, city, region, country
from importemployees
UNION
select distinct postalcode, city, region, country
from importcustomers
UNION
select distinct postalcode, city, region, country
from importproducts
UNION
select distinct shippostalcode as postalcode, shipcity as city, shipregion as region, shipcountry as country
from importorders) T

Query and result

如你所见。有些行是重复的。

如果我先使用INSERT IGNORE插入importcustomers,然后importorders,则会设法将记录标识为重​​复记录。为什么选择查询不起作用?

1 个答案:

答案 0 :(得分:2)

很奇怪的问题。当我放弃“国家”时,它似乎解决了这个问题。

SELECT DISTINCT PostalCode, City, Region
总共

128,查询耗时0.0066秒

SELECT DISTINCT PostalCode, City, Region, Country
总共

209,查询耗时0.0002秒

此外,该行为似乎只影响ImportCustomersImportOrders

SELECT postalcode, city, region, country
FROM 
    (SELECT postalcode, city, region, country FROM importcustomers
    UNION
    SELECT shippostalcode, shipcity, shipregion, shipcountry FROM importorders) t
总共

172,查询耗时0.0053秒

SELECT postalcode
FROM 
    (SELECT postalcode FROM importcustomers
    UNION
    SELECT shippostalcode FROM importorders) t
总共

91,查询耗时0.0050秒

然后我将其缩小到countryimportcusotmers

上的importorders
SELECT TRIM(country) AS country FROM importcustomers
UNION
SELECT TRIM(shipcountry) AS country FROM importorders
Argentina
Argentina
Austria
Austria
Belgium
Belgium
...

当我将列投射到BINARY

时发生了一些有趣的事情
SELECT BINARY country AS country FROM importcustomers
UNION
SELECT BINARY shipcountry AS country FROM importorders
Argentina
417267656e74696e610d
Austria
417573747269610d
Belgium
42656c6769756d0d
...

ImportOrders导致重复。

 SELECT BINARY shipcountry AS country FROM importorders
4765726d616e790d
5553410d
5553410d
4765726d616e790d
...

查看您提供的转储,附加到国家/地区末尾的额外\r(在值中由0d表示)。

--
-- Dumping data for table `importorders`
--
INSERT INTO `importorders` VALUES 
...'Germany\r'),
...'USA\r'),
...'USA\r'),
...'Germany\r'),
...'Mexico\r'),

importcustomers中的country看起来很好:

--
-- Dumping data for table `importcustomers`
--
INSERT INTO `importcustomers` VALUES 
...'Germany', ... ,
...'Mexico', ... ,
...'Mexico', ... ,
...'UK', ... ,
...'Sweden', ... ,

您可以通过运行此查询删除这些\r(回车):

UPDATE importorders SET ShipCountry = REPLACE(ShipCountry, '\r', '')

如果运行原始查询,则会获得所需的结果集。仅供参考,如果您使用DISTINCT,则不需要UNION

SELECT PostalCode, City, Region, Country
FROM 
    (SELECT postalcode, city, region, country FROM importemployees
    UNION
    SELECT postalcode, city, region, country FROM importcustomers
    UNION
    SELECT postalcode, city, region, country FROM importproducts
    UNION
    SELECT shippostalcode as postalcode, shipcity as city, 
        shipregion as region, shipcountry as country FROM importorders) T