如何从多个数据帧中比较并获得库仑值发生的计数

时间:2019-04-22 07:28:36

标签: python pandas

是否有可能基于2列比较4个数据框,并获得包含重复的结果(如果出现在2个或更多数据框中)。结果应包含发生次数。我的数据框看起来像

Select 
    "Store.id", "Store.Name", "Store.gstno", "Store.addressId",  
    "Address.addressId", "Address.addressLine1", "Address.addressLine2", 
    "Address.postcode", "Address.countryId", 
    "Country.countryId", "Country.name"
from 
    SOMETABLENAME WITH JOINS;

预期结果

>>>df1
  Circle Division Power 
0 AAAA   AA       25   
1 BBBB   BB       5     
>>>df2
  Circle Division Power 
0 CCCC   CC       25   
1 BBBB   BB       66
>>>df3
  Circle Division Power 
0 DDDD   DD       55   
1 FFFF   FF       68
2 AAAA   AA       87    
>>>df4
  Circle Division Power 
0 AAAA   AA       45   
1 CCCC   CC       56   

我试图一个接一个地合并,但后来陷入困境。

>>>result_df
  Circle Division Power1 power2 power3 power4 Repeated
0 AAAA   AA       25     -      87     45     3
1 BBBB   BB       5      66     -      -      2
2 CCCC   CC       -      25     -      56     2 

1 个答案:

答案 0 :(得分:2)

concatDataFrame.set_index和参数keys结合使用,将所有DataFrame连接在一起,展平MultiIndex

通过DataFrame.count创建新列以获取每行非NaN的值,并通过boolean indexing进行过滤:

dfs = [df1, df2, df3, df4]

comp = [x.set_index(['Circle','Division']) for x in dfs]
df = pd.concat(comp, axis=1, keys=(range(1, len(dfs)+ 1)))
df.columns = [f'{b}{a}' for a, b in df.columns]
df['Repeat'] = df.count(axis=1)

df = df[df['Repeat'] > 1]
df = df.reset_index()
print (df)
  Circle Division  Power1  Power2  Power3  Power4  Repeat
0   AAAA       AA    25.0     NaN    87.0    45.0       3
1   BBBB       BB     5.0    66.0     NaN     NaN       2
2   CCCC       CC     NaN    25.0     NaN    56.0       2