import pandas as pd
dfa = {'account':['a','b','a','c','a'],
'ret_type':['CTR','WO','T','CTR','T'],
'val':['0.0','0.1','0.2','0.3','0.4'],
'ins_date':['11','12','11','13','14']}
df = pd.DataFrame(dfa)
account ret_type val ins_date
0 a CTR 0.0 11
1 b WO 0.1 12
2 a T 0.2 11
3 c CTR 0.3 13
4 a T 0.4 14
我有一个要求,我需要消除重复的行,这样
1 duplicate row means combination of (account,ins_dat)
2 if duplicate found i need to keep row with ret type CTR abd drop row with T
3 i dont want to delete T rows for which no duplicate row is there like 4
4 in this example fr ex 2nd row is deleted as output finally
我应该怎么做?
答案 0 :(得分:1)
请检查此内容。您会得到答案。
df["duplicated"] = df[["account", "ins_date"]].duplicated(keep=False)
df = df[(df.ret_type == 'CTR') | ~df["duplicated"]]
答案 1 :(得分:0)
您可以使用
循环并检查重复项帐户和ret_type组合图
---然后使用索引删除该行。
import string
name = "(((((0, 7), 7), 8), 4), 5)"
table = string.maketrans( '', '', )
print name.translate(table,"()")
答案 2 :(得分:0)
我不确定我是否理解你:
map ={}
for index, row in df.iterrows():
if(map[row['account']]):
if(map[row['account']] == row['ret_type']):
df.drop(df.index[index])
else:
map[row['account']] = row['ret_type']