Question

我需要你的帮助，为以下需要编写SQL查询。我花了最近4天试图构建一个查询，但只是没有点击它。

我使用的是Microsoft SQL Server。

我在表格中有数据，如附图。

我想要的是识别那些分组的行

从Country-State-Country-Zip级别和
向上移动，如国家/地区 - 国家/地区，国家/地区，最后是最高级别*国家/地区的数量仍然少于3行。

所以House_Number 0001,0002,0003,0004将属于

 US-CA-A-B group ( count = 4)

House_Number 0005,0006,0007将属于

  US-CA-A group ( count = 3)

House_Number 0009,0010,0011将属于

 DE-QW-Q-Y group (count = 3)

House_Number 0013,0014,0015将属于

 AU group (count = 3)

所以我要选择的两行是House_Number 0008和0012

Table Data

Answer 1

尝试这样的事情：

WITH Step1 AS (
    SELECT * FROM dbo.TheTable t1
    WHERE NOT EXISTS (
        SELECT * FROM (
            SELECT Country, State, County, Zip FROM dbo.TheTable 
            GROUP BY Country, State, County, Zip HAVING COUNT(*)>=3
        ) x1 WHERE t1.Country=x1.Country AND t1.State=x1.State AND t1.County=x1.County AND t1.Zip=x1.Zip
    )
), Step2 AS (
    SELECT * FROM Step1 t2
    WHERE NOT EXISTS (
        SELECT * FROM (
            SELECT Country, State, County FROM Step1 
            GROUP BY Country, State, County HAVING COUNT(*)>=3
        ) x2 WHERE t2.Country=x2.Country AND t2.State=x2.State AND t2.County=x2.County
    )
), Step3 AS (
    SELECT * FROM Step2 t3
    WHERE NOT EXISTS (
        SELECT * FROM (
            SELECT Country, State FROM Step2 
            GROUP BY Country, State HAVING COUNT(*)>=3
        ) x3 WHERE t3.Country=x3.Country AND t3.State=x3.State
    )
), Step4 AS (
    SELECT * FROM Step3 t4
    WHERE NOT EXISTS (
        SELECT * FROM (
            SELECT Country FROM Step3
            GROUP BY Country HAVING COUNT(*)>=3
        ) x4 WHERE t4.Country=x4.Country
    )
) 
SELECT * FROM Step4

Answer 2

您似乎希望发生次数不超过两次的国家/地区。根据你的规则，我不明白为什么额外的＆＃34; DE＆＃34;行不会与前三行合并。

所以：

select t.*
from (select t.*, count(*) over (partition by country) as cnt
      from t
     ) t
where cnt <= 2;

编辑：

为了增加评论提供的理解（与问题一致但确实澄清），您可以使用一系列not exists：

select t.*
from t
where not exists (select 1
                  from t t2
                  where t2.country = t.county and
                        t2.state = t.state and
                        t2.county = t.county and
                        t2.zip = t.zip
                  group by t2.country, t2.state, t2.county, t2.zip
                  having count(*) > 2
                 ) and
       not exists (select 1
                  from t t2
                  where t2.country = t.county and
                        t2.state = t.state and
                        t2.county = t.county 
                  group by t2.country, t2.state, t2.county
                  having count(*) > 2
                 ) and
       not exists (select 1
                  from t t2
                  where t2.country = t.county and
                        t2.state = t.state
                  group by t2.country, t2.state
                  having count(*) > 2
                 ) and
       not exists (select 1
                  from t t2
                  where t2.country = t.county
                  group by t2.country
                  having count(*) > 2
                 );

编辑2：

这是一个棘手的问题。我假设你在每一行都有一个id，它唯一地定义它们。然后：

with t1 as (
      select id
      from t
      except
      select id
      from t 
      group by country, state, county, zip
      having count(*) > 2
     ),
     t2 as (
      select id
      from t1
      except
      select id
      from t1
      group by country, state, county
      having count(*) > 2
     ),
     t3 as (
      select id
      from t2
      except
      select id
      from t2
      group by country, state
      having count(*) > 2
     ),
     t4 as (
      select id
      from t3
      except
      select id
      from t3
      group by country, state, county, zip
      having count(*) > 2
     )
select *
from t4;

如果您没有唯一ID，则可以尝试将所有列连接在一起生成一个 - 或者使用row_number()进行稳定排序。

如果您想要其他列，请在最后阶段将它们重新加入。

无法定义正确的SQL查询以实现解释的结果

2 个答案: