熊猫,根据其他列的值删除重复的行

时间:2019-10-08 14:10:03

标签: python pandas

示例数据:

df1 = pd.DataFrame({
    'file': ['file1','file1','file1','file2','file2','file2','file3','file3','file3'],
    'prop1': ['True','False','True','False','False','False','True','False','False'],
    'prop2': ['False','False','False','False','True','False','False','True','False'],
    'prop3': ['False','True','False','True','False','True','False','False','True']
})

file    prop1   prop2   prop3
0   file1   True    False   False
1   file1   False   False   True
2   file1   True    False   False
3   file2   False   False   True
4   file2   False   True    False
5   file2   False   False   True
6   file3   True    False   False
7   file3   False   True    False
8   file3   False   False   True

File1的prop1为2的真值,file2的prop3为2的真值,file3的props为1的真值。所以我需要制作另一个像这样的数据框:

    file    prop
0   file1   prop1
1   file2   prop3
2   file3   diff (file3 props are different)

2 个答案:

答案 0 :(得分:2)

我们可以结合使用idxmaxsum来检测max的值

s=df1.set_index('file').sum(level=0)

s.idxmax(1).mask(s.eq(s.max(1),axis=0).sum(1)==3,'diff')
file
file1    prop1
file2    prop3
file3     diff
dtype: object

答案 1 :(得分:2)

由于您的数据是Checkbox而不是string,因此我们需要一些技巧:

bool

输出:

(df1.iloc[:,1:].eq('True')  # props are string
    .groupby(df1['file'])   # groupby each file
    .sum()                  # count the True's in each group
    .gt(1)                  # mask the column with more than 1 True
    .dot(df1.columns[1:])   # get the column name
    .replace('','diff')     # fill those files with no double True
)
相关问题