我正在使用调查响应文件,其中需要根据以下逻辑删除重复的条目:
对于项目1,我做了以下事情:
#sort by the date submitted to make it easy keep the first submitted duplicate
df = df.sort_values(by=["crmid", "Date Submitted"])
# locate duplicate crmids with a status of Complete and keep the first entry
df.loc[
((~df.duplicated(subset=["crmid"], keep="first")) & (df["Status"] == "Complete"))
]
对于项目2,我正在考虑做以下事情:
df = df.dropna(subset=['Net_Promoter'])
因为这将删除所有净空白促销分数的条目。
我需要有关如何处理第3项的指南。我有一个重复的条目(基于crmid),具有两个不同的状态,并且我希望该条目保持为“完整状态”。任何帮助,我们将不胜感激。
提供了样本数据:
Date Submitted Status crmid Net_Promoter
5 8/5/20 17:51 Complete A178171150S20191230 5.0
0 8/7/20 7:56 Complete A178171150S20191230 5.0
1 8/2/20 9:45 Partial A181007471S20200218 5.0
6 8/3/20 20:12 Partial A181007471S20200218 5.0
2 8/12/20 22:05 Complete A182477806S20200310 5.0
7 8/9/20 18:06 Partial A182477806S20200310 5.0
3 8/17/20 23:19 Complete A184046243S20200423 5.0
11 8/4/20 12:50 Complete A184722610S20200722 5.0
9 8/2/20 20:47 Partial A186529222S20200619 5.0
10 8/24/20 2:05 Complete A189465160S20200723 5.0
8 8/2/20 13:03 Partial A189484270S20200721 5.0
12 8/1/20 10:56 Partial A189680771S20200722 2.0