我有一个如下的数据框 -
df1_data = {'sym' :{0:'AAA',1:'BBB',2:'CCC',3:'DDD',4:'DDD',5:'CCC'},
'id' :{0:'101',1:'102',2:'103',3:'104',4:'105',5:'106'},
'sal':{0:'1000',1:'1000',2:'1000',3:'1000',4:'1000',5:'1000'},
'loc':{0:'zzz',1:'zzz',2:'zzz',3:'zzz',4:'zzz',5:'zzz'},
'name':{0:'abc',1:'abc',2:'abc',3:'pqr',4:'pqr',5:'pqr'}}
df = pd.DataFrame(df1_data)
print df
id loc name sal sym
0 101 zzz abc 1000 AAA
1 102 zzz abc 1000 BBB
2 103 zzz abc 1000 CCC
3 104 zzz pqr 1000 DDD
4 105 zzz pqr 1000 DDD
5 106 zzz pqr 1000 CCC
我想检查上面数据框的哪些列在所有行中包含相同的值。根据该要求,我希望在一个数据帧中使用相同的列,在另一个数据帧中使用不匹配的列。
预期产出 -
matched_df -
loc sal
0 zzz 1000
1 zzz 1000
2 zzz 1000
3 zzz 1000
4 zzz 1000
5 zzz 1000
unmatched_df -
id name sym
0 101 abc AAA
1 102 abc BBB
2 103 abc CCC
3 104 pqr DDD
4 105 pqr DDD
5 106 pqr CCC
答案 0 :(得分:3)
您可以将df
与第一行eq
进行比较,然后按all
检查所有True
值:
print (df.eq(df.iloc[0]))
id loc name sal sym
0 True True True True True
1 False True True True False
2 False True True True False
3 False True False True False
4 False True False True False
5 False True False True False
mask = df.eq(df.iloc[0]).all()
print (mask)
id False
loc True
name False
sal True
sym False
dtype: bool
print (df.loc[:, mask])
loc sal
0 zzz 1000
1 zzz 1000
2 zzz 1000
3 zzz 1000
4 zzz 1000
5 zzz 1000
print (df.loc[:, ~mask])
id name sym
0 101 abc AAA
1 102 abc BBB
2 103 abc CCC
3 104 pqr DDD
4 105 pqr DDD
5 106 pqr CCC
mask
的另一种方式是比较numpy arrays
:
arr = df.values
mask = (arr == arr[0]).all(axis=0)
print (mask)
[False True False True False]
print (df.loc[:, mask])
loc sal
0 zzz 1000
1 zzz 1000
2 zzz 1000
3 zzz 1000
4 zzz 1000
5 zzz 1000
print (df.loc[:, ~mask])
id name sym
0 101 abc AAA
1 102 abc BBB
2 103 abc CCC
3 104 pqr DDD
4 105 pqr DDD
5 106 pqr CCC