我有一个表-SO_RPT_BASEOFFER_LVL1,如
for tweet in tweets:
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', tweet.text)
for url in urls:
try:
res = urllib2.urlopen(url)
actual_url = res.geturl()
print actual_url
except:
print url
现在我需要删除所有重复的行(基于字段SO_ID,SO_NAME,SO_DESCRIPTION,PRIORITY和ADE_PRIORITIZED重复),保持行具有最高的部署日期,如果部署日期相同,则保持行具有最低的from_env值。
我试过这个
DEPLOYMENT_DATE, STATUS, FROM_ENV, SO_ID, SO_NAME, SO_DESCRIPTION, OFFER_ID, SO_CATEGORY, SO_TYPE, DISPOSITION, SECTION, MAIN_PERMUTATION, SO_LOB, PRIORITY, ASSOCIATED_GROUP, ADE_PRIORITIZED, ADE_NAME
01-JAN-01 ID 0 CVOIP_BASE_PROMO_VS0011 VOIP Unlimited Secondary Line $7.50 flat rate ongoing Non Management Discount 88341523 Telephone BASE_PROMO Y CVOIP 40 Y VS0011-Non Management Discount
01-JAN-01 ID 3 CVOIP_BASE_PROMO_VS0011 VOIP Unlimited Secondary Line $7.50 flat rate ongoing Non Management Discount 88341523 Telephone BASE_PROMO Y CVOIP 40 Y VS0011-Non Management Discount
03-MAR-17 ID 2 CVOIP_BASE_PROMO_VS0011 VOIP Unlimited Secondary Line $7.50 flat rate ongoing Non Management Discount 88341523 Telephone BASE_PROMO Y CVOIP 20 Y VS0011-Non Management Discount
04-FEB-17 ID 1 CVOIP_BASE_PROMO_VS0011 VOIP Unlimited Secondary Line $7.50 flat rate ongoing Non Management Discount 88341523 Telephone BASE_PROMO Y CVOIP 20 N VS0011-Non Management Discount
01-JAN-01 P 0 CVOIP_BASE_PROMO_VS0029 Voice 200 Install NRC 100% off 0 mo VS0029 100% off Installation Fee 88427443 Telephone BASE_PROMO Y CVOIP 20 Y VS0029-100% off Installation Fee
01-JAN-01 P 1 CVOIP_BASE_PROMO_VS0029 Voice 200 Install NRC 100% off 0 mo VS0029 100% off Installation Fee 88427443 Telephone BASE_PROMO Y CVOIP 20 Y VS0029-100% off Installation Fee
01-JAN-01 P 2 CVOIP_BASE_PROMO_VS0029 Voice 200 Install NRC 100% off 0 mo VS0029 100% off Installation Fee 88427443 Telephone BASE_PROMO Y CVOIP 20 Y VS0029-100% off Installation Fee
01-JAN-01 P 3 CVOIP_BASE_PROMO_VS0029 Voice 200 Install NRC 100% off 0 mo VS0029 100% off Installation Fee 88427443 Telephone BASE_PROMO Y CVOIP 20 Y VS0029-100% off Installation Fee
01-JAN-01 P 0 HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer SAVE - $15 off IPBB for 12 mo (1.5 - 75M) 88464673 Telephone ADD_PROMO Y HSIA 6145 Y IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
10-JAN-16 P 0 HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer SAVE - $15 off IPBB for 12 mo (1.5 - 75M) 88464673 Telephone ADD_PROMO Y HSIA 6100 Y IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
12-JUL-16 P 0 HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer SAVE - $5 off/mo ongoing w/HSI Upg (3-18M) 88464673 Telephone ADD_PROMO Y HSIA 6148 Y IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
12-FEB-17 ID 1 HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer SAVE - $15 off IPBB for 12 mo (1.5 - 75M) 88464673 Telephone ADD_PROMO Y HSIA 6145 Y IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
12-FEB-17 ID 2 HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer SAVE - $15 off IPBB for 12 mo (1.5 - 75M) 88464673 Telephone ADD_PROMO Y HSIA 6145 Y IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
12-FEB-17 ID 3 HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer SAVE - $15 off IPBB for 12 mo (1.5 - 75M) 88464673 Telephone ADD_PROMO Y HSIA 6145 Y IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
01-JAN-01 P 0 DTSTB_DTV4KGenie_L DTSTB_DTV4KGenie 4K Genie Mini 88834924 Television RACK_RATE_RC N DTV4KGenie HSIA 6145 DTVSTB_LEASED Y DTSTB_DTV4KGenie_L
01-JAN-01 ID 1 DTSTB_DTV4KGenie_L DTSTB_DTV4KGenie 4K Genie Mini 88834924 Television RACK_RATE_RC N DTV4KGenie HSIA 6145 DTVSTB_LEASED Y DTSTB_DTV4KGenie_L
01-JAN-01 ID 2 DTSTB_DTV4KGenie_L DTSTB_DTV4KGenie 4K Genie Mini 88834924 Television RACK_RATE_RC N DTV4KGenie HSIA 6145 DTVSTB_LEASED Y DTSTB_DTV4KGenie_L
25-FEB-17 ID 3 DTSTB_DTV4KGenie_L DTSTB_DTV4KGenie 4K Genie Mini 88834924 Television RACK_RATE_RC N DTV4KGenie HSIA 6145 DTVSTB_LEASED N DTSTB_DTV4KGenie_L
12-FEB-17 P 0 HSIA_ADITIONAL_PROMO_IS0236 SAVE Promotional Offer SAVE - $15 off IPBB for 12 mo (1.5 - 75M) 88464673 Telephone ADD_PROMO Y HSIA 6145 Y IS0236-STACKABLE - SAVE - $15 off IPBB for 12 mo (1.5 - 75M)
但是我的结果不正确。不知怎的,行
create table SO_RPT_BASEOFFER_LVL1_nodups
as
with dups as
( select DEPLOYMENT_DATE, STATUS, FROM_ENV, SO_ID, SO_NAME, SO_DESCRIPTION, OFFER_ID, SO_CATEGORY, SO_TYPE, DISPOSITION, SECTION, MAIN_PERMUTATION, SO_LOB, PRIORITY, ASSOCIATED_GROUP, ADE_PRIORITIZED, ADE_NAME,
row_number() over ( partition by SO_NAME,SO_DESCRIPTION, PRIORITY, ADE_PRIORITIZED order by deployment_date desc, from_env asc ) rn
from SO_RPT_BASEOFFER_LVL1
)
select DEPLOYMENT_DATE, STATUS, FROM_ENV, SO_ID, SO_NAME, SO_DESCRIPTION, OFFER_ID, SO_CATEGORY, SO_TYPE, DISPOSITION, SECTION, MAIN_PERMUTATION, SO_LOB, PRIORITY, ASSOCIATED_GROUP, ADE_PRIORITIZED, ADE_NAME
from dups
where rn=1;
没有被接走。
有人可以建议吗?
答案 0 :(得分:0)
最后一行,日期为12-FEB-17,似乎在partition by
列中具有相同的值,以及更晚的日期 - 因此胜过所有01-JAN-01行,不是吗?
答案 1 :(得分:0)
select SO_ID, SO_NAME, SO_DESCRIPTION, PRIORITY,ADE_PRIORITIZED,
DEPLOYMENT_DATE,min(FROM_ENV) as FROM_ENV
from
(
select
SO_ID,
SO_NAME,
SO_DESCRIPTION,
PRIORITY,
ADE_PRIORITIZED,
FROM_ENV,
max(DEPLOYMENT_DATE) as DEPLOYMENT_DATE
from
SO_RPT_BASEOFFER_LVL1
group by
SO_ID, SO_NAME, SO_DESCRIPTION, PRIORITY,ADE_PRIORITIZED,FROM_ENV
)
group by
SO_ID, SO_NAME, SO_DESCRIPTION, PRIORITY,ADE_PRIORITIZED,DEPLOYMENT_DATE
我已将问题分成两部分。首先,我尝试选择最高或最近的日期。如果有两个我选择最少的FROM_ENV。