有些文章确实在堆栈上有所帮助,但是,无法通过Hive中的计数删除行。
Apple有2个row_counts。如何仅为Apple选择1行计数?
- 看起来是什么数据......总共14条记录
customerID date product_type
1234abc 20140105 Orange
1234abc 20140105 Apple
1234abc 20140205 Orange
1234abc 20140205 Apple
1234abc 20140205 Apple
1234abc 20140305 Orange
1234abc 20140305 Apple
1234abc 20140305 Apple
1234abc 20140405 Orange
1234abc 20140405 Apple
1234abc 20140405 Apple
1234abc 20140505 Orange
1234abc 20140505 Apple
1234abc 20140505 Apple
- 最终输出。共10条记录
customerID date product_type
1234abc 20140105 Orange
1234abc 20140105 Apple
1234abc 20140205 Orange
1234abc 20140205 Apple
1234abc 20140305 Orange
1234abc 20140305 Apple
1234abc 20140405 Orange
1234abc 20140405 Apple
1234abc 20140505 Orange
1234abc 20140505 Apple
答案 0 :(得分:1)
从your_table中选择不同的customerID,date,product_type
答案 1 :(得分:0)
我建议采用两步法。步骤1:创建一个临时表,插入重复记录列表,使用insert和select,如下所示:
CREATE TABLE #Temp( product_Name Char( 30 ), Date Date, CustomerID int );
INSERT INTO #temp (product_Name, Date, CustomerID)
select x.dup, x.[Product_name] as nameX
, x.[Date] as dateX, x.CustomerID
from (
SELECT count(*) as dup
,[Product_Name]
, CustonmerID
,[TestDate]
FROM dbo.[yourtable]
group by [Date] ,[Product_Name], CustomerID ) x
where dup > 1
然后用
删除重复项 delete from
dbo.[originaltable]
where EXISTS (SELECT product_Name, Date, CustomerID from #Temp WHERE Product_Name= [dbo].[originaltable].Product_Name and Date=[dbo].[originalTable].Date )
步骤2:将#temp表内容插入到原始表中。