Question

我有3个数据框：能源，GDP和ScimEn。所有数据框都有一个“国家”列，我在使用内部联接时合并了所有3个数据框：

a = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='inner')
b = pd.merge(a,ScimEn,left_on='Country',right_on='Country',how='inner')

现在，我想算出这次合并所排除的国家/地区数量。

我尝试了以下公式，但给我一个错误“ ValueError：无法将现有列的名称用于指标列”：

z = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='outer', indicator=True)
f = pd.merge(z,ScimEn,left_on='Country',right_on='Country',how='inner',indicator=True)
g = f.query('_merge != "both"').shape[0]

有人可以提出解决方案吗？

Answer 1

ValueError是由于两次indicator=True合并造成的，默认情况下，当指标设置为True时，_merge列将添加到数据框中。

>>> z.columns[z.columns.str.contains('_merge')]
Index(['_merge'], dtype='object')

由于_merge中已经存在z dataframe，因此出现了创建下一个f dataframe的ValueError。

z = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='outer', indicator=True)
f = pd.merge(z,ScimEn,left_on='Country',right_on='Country',how='outer',indicator = 'merge1')
j = pd.merge(f,energy,left_on='Country',right_on='Country',how='outer',indicator = 'merge2')

j[(j['_merge'] != 'both') | (j['merge1']!='both')  | (j['merge2']!='both') ].shape[0]

或

j.shape[0] - b.shape[0]

合并3个数据框时如何识别内部联接中未包含的内容

1 个答案: