我有3列,如下所示,pandas dataframe
中的标题为screenName screen_name_retweet screen_name_mention
User1 User10 User1
User4 User10 User5
User3 User3 User12
User6 User10 User7
。
screen_name
我想要的是将screen_name_retweet
与screen_name_mention
和screen_name and screen_name_retweet or screen_name_mention
匹配,如果在screen_name_retweet and screen_name_mention
之间找到重复项,则将该列(''
)替换为{{{ 1}}。所以上面的列应该是这样的
screenName screen_name_retweet screen_name_mention
User1 User10
User4 User10 User5
User3 User12
User6 User10 User7
如何获得所需的答案?
我已经尝试过这个:
df.loc[(df['screenName'].duplicated() & df['screen_name_mention'].duplicated()), ['screen_name_mention']] = ''
但没有任何反应,表格保持不变
答案 0 :(得分:0)
使用replace方法
import pandas as pd
df = pd.read_csv(file_name) #read your file as dataframe
for index, row in df.iterrows():
if row[0]==row[1]:
df['screen_name_retweet'].replace(row[1], "", inplace = True)
if row[0] == row[2]:
df['screen_name_mention'].replace(row[2], "", inplace = True)
print(df)
答案 1 :(得分:0)
import pandas as pd
a = pd.DataFrame([["user1","user10","user1"],
["user4","user10","user5"],
["user3","user3","user12"]] ,
columns=["i1","i2","i3"]) #simplified input dataframe
for i in a.index:
m = a.loc[i].duplicated() #mask array for each rows
a.loc[i] = a.loc[i].mask(m).fillna("") #filter duplicates and fill by empty string
我认为这个解决方案可以从性能的角度进行改进,但它确实有效。