需要删除重复项,保持最高日期1和最低环境1

时间:2017-02-02 22:57:52

标签: sql oracle duplicates

我有一个表-SO_RPT_BASEOFFER_LVL1,如

for tweet in tweets:
    urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', tweet.text)
    for url in urls:
        try:
            res = urllib2.urlopen(url)
            actual_url = res.geturl()
            print actual_url
        except:
            print url

现在我需要删除所有重复的行(基于字段SO_ID,SO_NAME,SO_DESCRIPTION,PRIORITY和ADE_PRIORITIZED重复),保持行具有最高的部署日期,如果部署日期相同,则保持行具有最低的from_env值。

我试过这个

DEPLOYMENT_DATE, STATUS, FROM_ENV, SO_ID,   SO_NAME,    SO_DESCRIPTION,     OFFER_ID,   SO_CATEGORY,    SO_TYPE, DISPOSITION,   SECTION, MAIN_PERMUTATION,  SO_LOB,     PRIORITY,   ASSOCIATED_GROUP,   ADE_PRIORITIZED,    ADE_NAME

01-JAN-01   ID  0   CVOIP_BASE_PROMO_VS0011 VOIP Unlimited Secondary Line $7.50 flat rate ongoing   Non Management Discount 88341523    Telephone   BASE_PROMO  Y           CVOIP   40      Y   VS0011-Non Management Discount
01-JAN-01   ID  3   CVOIP_BASE_PROMO_VS0011 VOIP Unlimited Secondary Line $7.50 flat rate ongoing   Non Management Discount 88341523    Telephone   BASE_PROMO  Y           CVOIP   40      Y   VS0011-Non Management Discount
03-MAR-17   ID  2   CVOIP_BASE_PROMO_VS0011 VOIP Unlimited Secondary Line $7.50 flat rate ongoing   Non Management Discount 88341523    Telephone   BASE_PROMO  Y           CVOIP   20      Y   VS0011-Non Management Discount
04-FEB-17   ID  1   CVOIP_BASE_PROMO_VS0011 VOIP Unlimited Secondary Line $7.50 flat rate ongoing   Non Management Discount 88341523    Telephone   BASE_PROMO  Y           CVOIP   20      N   VS0011-Non Management Discount
01-JAN-01   P   0   CVOIP_BASE_PROMO_VS0029 Voice 200 Install NRC 100% off 0 mo VS0029  100% off Installation Fee   88427443    Telephone   BASE_PROMO  Y           CVOIP   20      Y   VS0029-100% off Installation Fee
01-JAN-01   P   1   CVOIP_BASE_PROMO_VS0029 Voice 200 Install NRC 100% off 0 mo VS0029  100% off Installation Fee   88427443    Telephone   BASE_PROMO  Y           CVOIP   20      Y   VS0029-100% off Installation Fee
01-JAN-01   P   2   CVOIP_BASE_PROMO_VS0029 Voice 200 Install NRC 100% off 0 mo VS0029  100% off Installation Fee   88427443    Telephone   BASE_PROMO  Y           CVOIP   20      Y   VS0029-100% off Installation Fee
01-JAN-01   P   3   CVOIP_BASE_PROMO_VS0029 Voice 200 Install NRC 100% off 0 mo VS0029  100% off Installation Fee   88427443    Telephone   BASE_PROMO  Y           CVOIP   20      Y   VS0029-100% off Installation Fee
01-JAN-01   P   0   HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer  SAVE - $15 off IPBB for 12 mo (1.5 - 75M)   88464673    Telephone   ADD_PROMO   Y           HSIA    6145        Y   IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
10-JAN-16   P   0   HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer  SAVE - $15 off IPBB for 12 mo (1.5 - 75M)   88464673    Telephone   ADD_PROMO   Y           HSIA    6100        Y   IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
12-JUL-16   P   0   HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer  SAVE - $5 off/mo ongoing w/HSI Upg (3-18M)  88464673    Telephone   ADD_PROMO   Y           HSIA    6148        Y   IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
12-FEB-17   ID  1   HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer  SAVE - $15 off IPBB for 12 mo (1.5 - 75M)   88464673    Telephone   ADD_PROMO   Y           HSIA    6145        Y   IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
12-FEB-17   ID  2   HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer  SAVE - $15 off IPBB for 12 mo (1.5 - 75M)   88464673    Telephone   ADD_PROMO   Y           HSIA    6145        Y   IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
12-FEB-17   ID  3   HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer  SAVE - $15 off IPBB for 12 mo (1.5 - 75M)   88464673    Telephone   ADD_PROMO   Y           HSIA    6145        Y   IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
01-JAN-01   P   0   DTSTB_DTV4KGenie_L  DTSTB_DTV4KGenie    4K Genie Mini   88834924    Television  RACK_RATE_RC    N       DTV4KGenie  HSIA    6145    DTVSTB_LEASED   Y   DTSTB_DTV4KGenie_L
01-JAN-01   ID  1   DTSTB_DTV4KGenie_L  DTSTB_DTV4KGenie    4K Genie Mini   88834924    Television  RACK_RATE_RC    N       DTV4KGenie  HSIA    6145    DTVSTB_LEASED   Y   DTSTB_DTV4KGenie_L
01-JAN-01   ID  2   DTSTB_DTV4KGenie_L  DTSTB_DTV4KGenie    4K Genie Mini   88834924    Television  RACK_RATE_RC    N       DTV4KGenie  HSIA    6145    DTVSTB_LEASED   Y   DTSTB_DTV4KGenie_L
25-FEB-17   ID  3   DTSTB_DTV4KGenie_L  DTSTB_DTV4KGenie    4K Genie Mini   88834924    Television  RACK_RATE_RC    N       DTV4KGenie  HSIA    6145    DTVSTB_LEASED   N   DTSTB_DTV4KGenie_L
12-FEB-17   P   0   HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer  SAVE - $15 off IPBB for 12 mo (1.5 - 75M)   88464673    Telephone   ADD_PROMO   Y           HSIA    6145        Y   IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)

但是我的结果不正确。不知怎的,行

create table SO_RPT_BASEOFFER_LVL1_nodups 
as
with dups as 
( select DEPLOYMENT_DATE, STATUS, FROM_ENV, SO_ID,  SO_NAME,    SO_DESCRIPTION,     OFFER_ID,   SO_CATEGORY,    SO_TYPE, DISPOSITION,   SECTION, MAIN_PERMUTATION,  SO_LOB,     PRIORITY,   ASSOCIATED_GROUP,   ADE_PRIORITIZED,    ADE_NAME,
        row_number() over ( partition by SO_NAME,SO_DESCRIPTION, PRIORITY, ADE_PRIORITIZED order by deployment_date desc, from_env asc ) rn 
  from SO_RPT_BASEOFFER_LVL1 
) 
select  DEPLOYMENT_DATE, STATUS, FROM_ENV, SO_ID,   SO_NAME,    SO_DESCRIPTION,     OFFER_ID,   SO_CATEGORY,    SO_TYPE, DISPOSITION,   SECTION, MAIN_PERMUTATION,  SO_LOB,     PRIORITY,   ASSOCIATED_GROUP,   ADE_PRIORITIZED,    ADE_NAME 
from dups 
where rn=1;

没有被接走。

有人可以建议吗?

2 个答案:

答案 0 :(得分:0)

最后一行,日期为12-FEB-17,似乎在partition by列中具有相同的值,以及更晚的日期 - 因此胜过所有01-JAN-01行,不是吗?

答案 1 :(得分:0)

select SO_ID, SO_NAME, SO_DESCRIPTION, PRIORITY,ADE_PRIORITIZED,
    DEPLOYMENT_DATE,min(FROM_ENV) as FROM_ENV
from
(
    select 
    SO_ID, 
    SO_NAME, 
    SO_DESCRIPTION, 
    PRIORITY,
    ADE_PRIORITIZED,
    FROM_ENV,
    max(DEPLOYMENT_DATE) as DEPLOYMENT_DATE
    from
    SO_RPT_BASEOFFER_LVL1
    group by
    SO_ID, SO_NAME, SO_DESCRIPTION, PRIORITY,ADE_PRIORITIZED,FROM_ENV
)
group by
SO_ID, SO_NAME, SO_DESCRIPTION, PRIORITY,ADE_PRIORITIZED,DEPLOYMENT_DATE

我已将问题分成两部分。首先,我尝试选择最高或最近的日期。如果有两个我选择最少的FROM_ENV。