我有一个如下所示的熊猫数据框
id name Base field1 field2 field3
1 AA Y Yes Consumer Not Applicable
1 BB N Yes Consumer Not Applicable
2 CC Y Yes Consumer Not Applicable
2 DD N Yes Not Applicable Not Applicable
2 EE N No Not Applicable Modified
3 FF Y Yes Not Applicable Applicable
3 GG N Yes Not Applicable Not Applicable
3 HH N Yes Not Applicable Not Applicable
预期结果是根据ID列对该数据帧进行分组,并检查其他所有列中的数据是否在每组中都是相同的数据,最后写入结果。
我尝试过此操作来验证每个组上的数据,但始终显示为真
代码:
result_list=[]
for col in df.columns:
result = df.groupby(level=0)[col].apply(lambda x: len(set(x))==1)
result_list.append(result)
final = pd.concat(result_list,1)
预期结果是
id name field1 field2 field3 Error
1 AA Yes Consumer Not Applicable Pass
1 BB Yes Consumer Not Applicable Pass
2 CC Yes Consumer Not Applicable field1, field2, field3 mismatch for ID: 2
2 DD Yes Not Applicable Not Applicable field1, field2, field3 mismatch for ID: 2
2 EE No Not Applicable Modified field1, field2, field3 mismatch for ID: 2
3 FF Yes Not Applicable Applicable field3 mismatch for ID: 3
3 GG Yes Not Applicable Not Applicable field3 mismatch for ID: 3
3 HH Yes Not Applicable Not Applicable field3 mismatch for ID: 3
对此有任何帮助吗?
答案 0 :(得分:0)
您可以通过代码获得所需的内容(假设df
的索引名为id
)
def handler(df):
for col in ['field1', 'field2', 'field3']:
if df.loc[:, col].nunique() > 1:
return 'error in {} for id {}'.format(col, df.index[0])
else:
return 'pass'
result = df.groupby(level=0).apply(handler)
result = df.reset_index().merge(result.to_frame().reset_index(), on='id')
result
是:
id name field1 field2 field3 0
0 1 AA Yes Consumer Not Applicable pass
1 1 BB Yes Consumer Not Applicable pass
2 2 CC Yes Consumer Not Applicable error in field1 for id 2
3 2 DD Yes Not Applicable Not Applicable error in field1 for id 2
4 2 EE No Not Applicable Modified error in field1 for id 2
5 3 FF Yes Not Applicable Applicable error in field3 for id 3
6 3 GG Yes Not Applicable Not Applicable error in field3 for id 3
7 3 HH Yes Not Applicable Not Applicable error in field3 for id 3
编辑-处理程序中的次要版本
def handler(df):
cols = list()
for col in ['field1', 'field2', 'field3']:
if df.loc[:, col].nunique() > 1:
cols.append(col)
if cols:
return 'error in {} for id {}'.format(', '.join(cols), df.index[0])
else:
return 'pass'
答案 1 :(得分:0)
您可以groupby
id,然后agg
每列来计算每组unique
个值的数量,然后您会发现该数字大于1时会出错, >
df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1
使用此输出,您可以基于此构建字符串。
field1 field2 field3
id
1 False False False
2 True True True
3 False False True