我有以下数据集:
**Fruit Animal Color City**
Apple Dog Yellow Paris
Apple Dog Blue Paris
Orange Dog Green Paris
Grape Dog Pink Paris
Orange Dog Grey NY
Peach Dog Purple Rome
我想使用熊猫删除每列(而不是整个行)中的重复数据。
输出示例:
**Fruit Animal Color City**
Apple Dog Yellow Paris
Grape Paris NY
Orange Green Rome
Peach Pink
Grey
Purple
关于
答案 0 :(得分:1)
我们可以做unique
s=df.T.apply(pd.Series.unique,1)
newdf=pd.DataFrame(s.tolist(),index=s.index).T
newdf
Out[57]:
**Fruit Animal Color City**
0 Apple Dog Yellow Paris
1 Orange None Blue NY
2 Grape None Green Rome
3 Peach None Pink None
4 None None Grey None
5 None None Purple None
答案 1 :(得分:0)
您可以使用drop_duplicates
逐列尝试:
for x in df.columns:
df[x] = df[x].drop_duplicates().reset_index(drop=True)
#output:
Fruit Animal Color City
0 Apple Dog Yellow Paris
1 Orange NaN Blue NY
2 Grape NaN Green Rome
3 Peach NaN Pink NaN
4 NaN NaN Grey NaN
5 NaN NaN Purple NaN