Store number allocwgt item date day
88006 0.14 40000349094 1/6/2013 Sunday
10374 0.14 40000349094 1/6/2013 Sunday
88010 0.14 40000349094 3/19/2017 Sunday
9388 0 40000349094 1/7/2013 Monday
9300 0 40000349094 3/20/2017 Monday
9300 0 40000349094 3/27/2017 Monday
1139 0 40000349094 3/16/2015 Monday
对于一个项目,我只需要选择一天中的一个记录,即星期日,因为在所有日期,allocwgt的值都是相同的。
一个项目可以为不同的日期每天创建多条记录,但我只需要7条记录。每天1条记录
i.e sunday, monday, tuesday as on..
Note : if record selected is of most updated will be good
Can someone help me in making this in hive query.
Expected output should be:
Store number allocwgt item date day
88006 0.14 40000349094 2017-03-19 Sunday
09300 0.00 40000349094 2017-03-27 Monday
答案 0 :(得分:0)
使用row_number()
。以下查询将为每个item
选择一条记录,且store_number
最少。在order by
中写下正确的over()
可以更改此行为,如果您需要为每个项目添加任何单个记录,并且order by
我已更换,则只需删除date
date
store_date
列,date
是hive
中的保留字。
select Store_number, allocwgt, item, store_date, day
from
(
select Store_number, allocwgt, item, store_date, day,
row_number() over(partition by item, store_date order by store_number) rn
from table_name
) s
where rn=1
答案 1 :(得分:0)
Thanks Query is giving result as expected but I had a confusion that allocwgt value will be same but it could be different which I found.
Now when I ran below query :
create table temp_cso_2 as
select *
from
(
select b.loc,
a.allocwgt,
b.item,
date_add('1970-01-01',cast ((a.Eff/1440)as int)) as date_from_minutes,
date_format(date_add('1970-01-01',cast ((a.Eff/1440)as int)),'EEEE') as day_of_date,
row_number() over(partition by item, date_add('1970-01-01',cast ((a.Eff/1440)as int)) order by b.loc) rn
from scm.CALDATA a left outer join scm.SKUDEMANDPARAM b
on a.cal = b.alloccal
where a.repeat = 0 and b.run_date= to_date('2017-03-02' ) and b.item between 40000000000 and 40000999999
) s
where rn=1
this query gives me below result
------------------------------+-----------------+----------------------+------------------+-------------------------------+-------------------------+----------------+--+
| temp_cso_2.loc | temp_cso_2.allocwgt | temp_cso_2.item | temp_cso_2.date_from_minutes | temp_cso_2.day_of_date | temp_cso_2.rn |
+-----------------+----------------------+------------------+-------------------------------+-------------------------+----------------+--+
| 00074 | 0.15 | 40000110552 | 2013-01-10 | Thursday | 1 |
| 00074 | 0.17 | 40000110552 | 2013-01-11 | Friday | 1 |
| 00074 | 0.17 | 40000110552 | 2013-01-12 | Saturday | 1 |
| 00074 | 0.12 | 40000110552 | 2013-01-06 | Sunday | 1 |
| 00074 | 0.12 | 40000110552 | 2013-01-07 | Monday | 1 |
| 00074 | 0.13 | 40000110552 | 2013-01-08 | Tuesday | 1 |
| 00074 | 0.14 | 40000110552 | 2013-01-09 | Wednesday | 1 |
| 00074 | 0.0 | 40000110552 | 2018-04-24 | Tuesday | 1 |
+-----------------+----------------------+------------------+-------------------------------+-------------------------+----------------+--+
So problem is in tuesday record. I got two records because allocwgt are differnt so what should I do so that I get only one latest date record. Also something to increase perfromance of this query ? Please help