熊猫:尝试合并两个数据框时出现KeyError

时间:2020-11-09 15:46:19

标签: python pandas dataframe

我有两个Excel工作表(主数据和输入数据),它们的索引列相同,但列数不同(请参见下文)。我想将输入DF合并到主DF中如果添加了新行(请参阅ID 103-105),或者输入DF中的一项已更新(请参见ID 102)。其他列可以忽略。

数据框1(主):

Master DF

数据框2(输入):

Input DF

目标(更新的单元格标记为黄色):

enter image description here

我正在使用以下脚本:

inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")

# Update existing rows
masterDf.update(inputDf)

# find out which ids are new
ids_of_new_rows = set(inputDf.index) - set(masterDf.index)

# get new rows that should be added to master
rows_to_add = masterDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns]

我能够更新Master DF并获得ids_of_new_rows。输出: {'CR103', 'CR104', 'CR105'}

但是,当尝试获取rows_to_add时,我总是收到以下错误:

KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['CR103', 'CR104', 'CR105'], dtype='object', name='ID')] are in the [index]"

有什么想法吗?

2 个答案:

答案 0 :(得分:1)

关于错误

该错误来自以下事实:ID中没有['CR103', 'CR104', 'CR105']masterDf的行,而inputDf中却没有rows_to_add = inputDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns] 。您正在尝试做的可能是

inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")

# Update existing rows
masterDf.update(inputDf)
# Add new rows
masterDf = pd.concat((masterDf, inputDf.loc[inputDf.index.difference(masterDf.index), inputDf.columns & masterDf.columns]))

您可能想做什么

inputDf

此处Index.difference用于获取masterDf中不存在的btns = self.toolBox.findChildren(QtWidgets.QAbstractButton) btns = [btn for btn in btns if btn.metaObject().className() == "QToolBoxButton"] for i, btn in enumerate(btns): # here you can check e.g. for i color = QtCore.Qt.red # or any other p = btn.palette() p.setColor(QtGui.QPalette.Button, color) btn.setPalette(p) 中的索引值。

答案 1 :(得分:0)

这是实现以下结果的正确脚本。简单的解决方案是更改inputDF和masterDF ...

# Define DataFrame
inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")

# Update existing rows
masterDf.update(inputDf)

# find out which ids are new
ids_of_new_rows = set(inputDf.index) - set(masterDf.index)

# get new rows that should be added to master
rows_to_add = inputDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns]

# add new rows to existing master
df_result = pd.concat([masterDf, rows_to_add])