Python Pandas-如何根据列值删除重复项

时间:2020-09-22 12:24:06

标签: python pandas dataframe filter duplicates

因此,我想按以下方式转换表: Input data

进入表,如下所示: Output data

目标是删除重复项,同时以True,False表示法从“ Value_c”列中保存有关值的信息。

1 个答案:

答案 0 :(得分:1)

您可以在groupby上使用get_dummies获得所需的输出。

>>> df = pd.DataFrame({"A":[1,1,1,2,2,2], "B":[1,1,1,2,2,2], "C":["Q","R","QR","R","QR","Q"], "D":[1,1,1,2,2,2], "E":["X","X","X","Y","Y","Y"]})
>>> df
   A  B   C  D  E
0  1  1   Q  1  X
1  1  1   R  1  X
2  1  1  QR  1  X
3  2  2   R  2  Y
4  2  2  QR  2  Y
5  2  2   Q  2  Y
>>> df = pd.get_dummies(df, columns=["C","E"])
>>> df.groupby(["A","B","D"]).agg(sum).reset_index()
   A  B  D  C_Q  C_QR  C_R  E_X  E_Y
0  1  1  1    1     1    1    3    0
1  2  2  2    1     1    1    0    3
>>> df.groupby(["A","B","D"]).agg(max).reset_index()
   A  B  D  C_Q  C_QR  C_R  E_X  E_Y
0  1  1  1    1     1    1    1    0
1  2  2  2    1     1    1    0    1
>>>