我知道如何摆脱熊猫的重复行,但是我的问题略有不同。假设我有一个像这样的数据框:
product from stop_1 stop_2 stop_3 stop_4 stop_5 stop_6 stop_7
metal Portugal Spain France Ukraine Spain France Ukraine Spain
fruit Spain France Italy
dairy Italy Switzerland Italy Switzerland
这是我想要获得的:
product from stop_1 stop_2 stop_3 stop_4 stop_5 stop_6 stop_7
metal Portugal Spain France Ukraine
fruit Spain France Italy
dairy Italy Switzerland
我怎么能得到这个?
答案 0 :(得分:3)
将mask
与duplicated
一起使用
df.mask(df.apply(lambda x : x.duplicated(),1))
Out[443]:
product from stop_1 stop_2 stop_3 stop_4 stop_5 stop_6 stop_7
0 metal Portugal Spain France Ukraine NaN NaN NaN NaN
1 fruit Spain France Italy NaN NaN NaN NaN NaN
2 dairy Italy Switzerland NaN NaN NaN NaN NaN NaN
答案 1 :(得分:1)
您可以使用drop_duplicates
和reindex
In [417]: df.apply(pd.Series.drop_duplicates, 1).reindex(columns=df.columns)
Out[417]:
product from stop_1 stop_2 stop_3 stop_4 stop_5 stop_6 stop_7
0 metal Portugal Spain France Ukraine NaN NaN NaN NaN
1 fruit Spain France Italy NaN NaN NaN NaN NaN
2 dairy Italy Switzerland NaN NaN NaN NaN NaN NaN
答案 2 :(得分:1)
这是我想出的:
df
Out[42]:
product from stop_1 stop_2 ... stop_4 stop_5 stop_6 stop_7
0 metal Portugal Spain France ... Spain France Ukraine Spain
1 fruit Spain France Italy ... NaN NaN NaN NaN
2 dairy Italy Switzerland Italy ... NaN NaN NaN NaN
# save column names first
colnames = list(df.columns)
df1 = pd.DataFrame([row.unique() for index, row in df.iterrows()])
# return column names
df1.columns = colnames[0:len(df1.columns)]
df1
Out[46]:
product from stop_1 stop_2 stop_3
0 metal Portugal Spain France Ukraine
1 fruit Spain France Italy NaN
2 dairy Italy Switzerland NaN None