示例数据:
df1 = pd.DataFrame({
'file': ['file1','file1','file1','file2','file2','file2','file3','file3','file3'],
'prop1': ['True','False','True','False','False','False','True','False','False'],
'prop2': ['False','False','False','False','True','False','False','True','False'],
'prop3': ['False','True','False','True','False','True','False','False','True']
})
file prop1 prop2 prop3
0 file1 True False False
1 file1 False False True
2 file1 True False False
3 file2 False False True
4 file2 False True False
5 file2 False False True
6 file3 True False False
7 file3 False True False
8 file3 False False True
File1的prop1为2的真值,file2的prop3为2的真值,file3的props为1的真值。所以我需要制作另一个像这样的数据框:
file prop
0 file1 prop1
1 file2 prop3
2 file3 diff (file3 props are different)
答案 0 :(得分:2)
我们可以结合使用idxmax
和sum
来检测max
的值
s=df1.set_index('file').sum(level=0)
s.idxmax(1).mask(s.eq(s.max(1),axis=0).sum(1)==3,'diff')
file
file1 prop1
file2 prop3
file3 diff
dtype: object
答案 1 :(得分:2)
由于您的数据是Checkbox
而不是string
,因此我们需要一些技巧:
bool
输出:
(df1.iloc[:,1:].eq('True') # props are string
.groupby(df1['file']) # groupby each file
.sum() # count the True's in each group
.gt(1) # mask the column with more than 1 True
.dot(df1.columns[1:]) # get the column name
.replace('','diff') # fill those files with no double True
)