Question

假设我有一个包含两列和以下值的表：

C1 | C2
------- 
a1   b1 
a1   b2 
a1   b3
a2   b1 
a2   b2
a2   b3
a3   b1
a3   b2
a3   b3

我想删除具有C1重复值的所有行，但是以剩余行的方式，保留C2的所有不同值。所以在这种情况下，结果必须是：

C1 | C2
------- 
a1   b1 
a2   b2
a3   b3

而不是像：

C1 | C2
------- 
a1   b1 
a2   b1
a3   b1

Answer 1

这是我在这种情况下使用的方式，使用

的T-SQL

<script type="text/javascript"
    src="https://maps.googleapis.com/maps/api/js?key=YOUR_API_KEY&libraries=geometry,places">
</script>

因为你不关心组合......你会得到不同的价值

Answer 2

我不认为有一种完全可靠的方法可以在SQL中执行您想要的操作。我怀疑实际问题可能等同于NP或NP完全的图形问题。

一个近似值是为每个值选择一个随机行：

select t.*
from (select t.*, 
             row_number() over (partition by c1 order by dbms_random.random) as seqnum
      from t
     ) t
where seqnum = 1;

这当然没有保证。但它至少可以获得你想要的行。

如果您拥有所有组合（例如在您的示例中），则第二种方法有效。如果是这样，您可以从值构造行：

select t1.c1, t2.c2
from (select least(count(distinct c1), count(distinct c2)) as cd from t) cross join
     (select distinct c1, rownum as rn from t) t1 join
     (select distinct c2, rownum as rn from t) t2
     on mod(t1.rn, cd) = mod(t1.rn, cd);

但是，这假设结果对实际上是连续的。

Answer 3

这个答案非常复杂，但我相信它可以解决问题！大数据集可能相当慢......

with selector as
( select rownum-1 as setnum
  from dual
  connect by level <= power(2,(select count(*) from my_table))
), /* This generates the integers 0..(2^n)-1 where n is number of rows in table */
data as
( select c1, c2, row_number() over (order by c1, c2) as rn
  from my_table
), /* This assigns each row in the table a row number 1..n */
cj as
( select setnum, c1, c2
  from selector cross join data
  where bitand(setnum, power(2,rn-1)) = power(2,rn-1)
 ), /* This generates all the possible sets of 1-n rows. 
       The rows in the set are determined by the bits of the setnum value
       e.g. setnum 5 (101 in binary) contains rows 1 and 4 */
set_sizes as
 ( select setnum, count(*) cnt from cj
   group by setnum
   having count(distinct c1) = (select count(distinct c1) from my_table)
   and count(distinct c2) = (select count(distinct c2) from my_table)
), /* This determines the number of rows in each set AND excludes sets that
      don't include all the c1 and c2 values */
one_set as
( select min(setnum) minsetnum from set_sizes
  where cnt = (select min(cnt) from set_sizes)
) /* This selects one of the sets that has the smallest number of rows */
select c1, c2 from cj
where setnum = (select minsetnum from one_set)
order by 1

这样做：

从表格
过滤掉那些不包含所有c1值和所有c2值的文件
从这些
任意选择其中一个最小的集并返回其数据

如果有人可以为我的with子句子查询建议更好（更有意义）的名称，请执行！

删除重复的行但保留第二列的所有可能值

3 个答案: