问题: 需要查找用户复制数据的所有实例。每次用户单击按钮时,它都会创建一组唯一的相同数据。我需要向一个组提供一个结果集,该组包括批处理的所有实例"复制"最终用户。
示例数据: 在Microsoft SQL Server上使用Microsoft SQL。
日期类型:批量int ,约会日期 ,引用int ,from_state varchar(2) ,to_state varchar(2) ,item int ,qty int
---------------------------------------------------------------------------------
| batch | date | reference | from_state | to_state | item | qty |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 11122334455 | 2 |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 66622334455 | 1 |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 77722334455 | 5 |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 11122334455 | 2 |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 66622334455 | 1 |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 77722334455 | 5 |
---------------------------------------------------------------------------------
| 3456781 | 2016-03-01 | 6854 | MT | CA | 00112233111 | 2 |
---------------------------------------------------------------------------------
| 3456781 | 2016-03-01 | 6854 | MT | CA | 00112255111 | 1 |
---------------------------------------------------------------------------------
| 3456781 | 2016-03-01 | 6854 | MT | CA | 33322334455 | 5 |
---------------------------------------------------------------------------------
期望的结果: 我需要显示所有信息才能解决问题。我可以通过from,to,item和qty来查找重复记录,但我对如何将其与批处理和参考编号联系起来感到很遗憾。
---------------------------------------------------------------------------------
| batch | date | reference | from_state | to_state | item | qty |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 11122334455 | 2 |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 66622334455 | 1 |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 77722334455 | 5 |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 11122334455 | 2 |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 66622334455 | 1 |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 77722334455 | 5 |
被抄袭的代码:
SELECT from_state
,to_state
,item
,qty
,COUNT(*)
FROM #TEMP_duplicates
HAVING COUNT(*) > 1
GROUP BY from_state
,to_state
,item
,qty
答案 0 :(得分:0)
您似乎在尝试查询中使用基于临时表引用的SQL Server,因此我将使用它运行。
这将处理单个副本。我必须更多地了解数据,以争论它的可靠程度。它可能足以用于手动验证的东西。我会看看我是否可以为多个副本考虑一些事情。
with T as (
select
batch,
min("date") as dt,
min(reference) as reference,
min(from_state) as from_state,
min(to_state) as to_state,
min(item) as item_min, max(item) as item_max, sum(item) as item_sum,
min(qty) as qty_min, max(qty) as qty_max, sum(qty) as qty_sum,
count(*) as cnt
from <yourdata>
group by batch
)
select t1.batch
from T t1 inner join T t2
on t2.batch > t1.batch and t2.reference <> t1.reference
and t2.dt = t1.dt
and t2.from_state = t1.from_state and t2.to_state = t1.to_state
and t2.item_min = t1.item_min and t2.qty_min = t1.qty_min
and t2.item_max = t1.item_max and t2.qty_max = t1.qty_max
and t2.item_sum = t1.item_sum and t2.qty_sum = t1.qty_sum
and t2.cnt = t1.cnt
group by t1.batch
我确定您使用item
的类型。您可能需要使用强制转换才能让sum()
生效。
编辑我认为这个在处理多个重复项集时更加强大。但不能说出表现。
with T as (
select
batch,
min("date") as dt,
min(reference) as reference,
min(from_state) as from_state,
min(to_state) as to_state,
min(item) as item_min, max(item) as item_max, sum(item) as item_sum,
min(qty) as qty_min, max(qty) as qty_max, sum(qty) as qty_sum,
count(*) as cnt
from <yourdata>
group by batch
),
pairs as (
select t1.*, t2.batch as batch2
from T t1 inner join T t2
on t2.batch > t1.batch and t2.reference <> t1.reference
and t2.dt = t1.dt
and t2.from_state = t1.from_state and t2.to_state = t1.to_state
and t2.item_min = t1.item_min and t2.qty_min = t1.qty_min
and t2.item_max = t1.item_max and t2.qty_max = t1.qty_max
and t2.item_sum = t1.item_sum and t2.qty_sum = t1.qty_sum
and t2.cnt = t1.cnt
)
select distinct
min(batch) over (
partition by
dt, from_state, to_state,
item_min, item_max, item_sum, qty_min, qty_max, qty_sum, cnt
) as orig_batch,
batch2 as dup_batch
from pairs
“原始”批次是ID最低的批次。
也许您希望通过匹配预先聚合的行来继续这种想法。将其附加到上面的CTE:
...
, matches as (
select p.batch, p.batch2
from
pairs p inner join
<yourdata> d1 on d1.batch = p.batch full outer join
<yourdata> d2 on d2.batch = p.batch2
and d2.dt = d1.dt
and d2.from_state = d1.from_state and d2.to_state = d1.to_state
and d2.item = d1.item and d2.qty = d1.qty
group by p.batch, p.batch2
having
count(d1.dt) = count(*) and count(d2.dt) = count(*)
and count(d1.from_state) = count(*) and count(d2.from_state) = count(*)
and count(d1.to_state) = count(*) and count(d2.to_state) = count(*)
and count(d1.item) = count(*) and count(d2.item) = count(*)
and count(d1.qty) = count(*) and count(d2.item) = count(*)
)
select distinct
min(batch) over (
partition by
dt, from_state, to_state,
item_min, item_max, item_sum, qty_min, qty_max, qty_sum, cnt
) as orig_batch,
batch2 as dup_batch
from pairs p inner join matches m on m.batch = p.batch and m.batch2 = p.batch2
答案 1 :(得分:0)
我感谢所有帮助我解决这个问题的人。
- 将结果集投入临时表#TEMP_baseresults
SELECT batch
,reference
,from_state
,to_state
,item
,qty
INTO #TEMP_baseresults
FROM datasource
- 查找SAME from_state,to_state,item和qty的所有重复项
SELECT from_state
,to_state
,item
,qty
,count(*) as 'count'
INTO #TEMP_batchduplicates
FROM #TEMP_baseresults
GROUP BY from_state
,to_state
,item
,qty
HAVING COUNT(*) > 1
ORDER BY from_state
,to_state
,item
,qty
- 在基表上加入重复表
SELECT *
FROM #TEMP_baseresults base
JOIN #TEMP_batchduplicates dup
ON dup.from_state = base.from_state
AND dup.to_state = base.to_state
AND dup.item = base.item
AND dup.qty = base.qty
ORDER BY base.from_state
,base.to_state
,base.item
结果显示:
-----------------------------------------------------------------------------------------
| batch | date | reference | from_state | to_state | item | qty | count |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 11122334455 | 2 | 2 |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 66622334455 | 1 | 2 |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 77722334455 | 5 | 2 |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 11122334455 | 2 | 2 |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 66622334455 | 1 | 2 |
-----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 77722334455 | 5 | 2 |
这将我的数据集过滤掉,仅显示已识别的重复记录,并另外标记数据可能重复的次数。