Question

我有两个Excel工作表（主数据和输入数据），它们的索引列相同，但列数不同（请参见下文）。我想将输入DF合并到主DF中如果添加了新行（请参阅ID 103-105），或者输入DF中的一项已更新（请参见ID 102）。其他列可以忽略。

数据框1（主）：

数据框2（输入）：

目标（更新的单元格标记为黄色）：

我正在使用以下脚本：

inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")

# Update existing rows
masterDf.update(inputDf)

# find out which ids are new
ids_of_new_rows = set(inputDf.index) - set(masterDf.index)

# get new rows that should be added to master
rows_to_add = masterDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns]

我能够更新Master DF并获得ids_of_new_rows。输出： {'CR103', 'CR104', 'CR105'}

但是，当尝试获取rows_to_add时，我总是收到以下错误：

KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['CR103', 'CR104', 'CR105'], dtype='object', name='ID')] are in the [index]"

有什么想法吗？

Answer 1

关于错误

该错误来自以下事实：ID中没有['CR103', 'CR104', 'CR105']中masterDf的行，而inputDf中却没有rows_to_add = inputDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns]。您正在尝试做的可能是

inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")

# Update existing rows
masterDf.update(inputDf)
# Add new rows
masterDf = pd.concat((masterDf, inputDf.loc[inputDf.index.difference(masterDf.index), inputDf.columns & masterDf.columns]))

您可能想做什么

inputDf

此处Index.difference用于获取masterDf中不存在的btns = self.toolBox.findChildren(QtWidgets.QAbstractButton) btns = [btn for btn in btns if btn.metaObject().className() == "QToolBoxButton"] for i, btn in enumerate(btns): # here you can check e.g. for i color = QtCore.Qt.red # or any other p = btn.palette() p.setColor(QtGui.QPalette.Button, color) btn.setPalette(p)中的索引值。

Answer 2

这是实现以下结果的正确脚本。简单的解决方案是更改inputDF和masterDF ...

# Define DataFrame
inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")

# Update existing rows
masterDf.update(inputDf)

# find out which ids are new
ids_of_new_rows = set(inputDf.index) - set(masterDf.index)

# get new rows that should be added to master
rows_to_add = inputDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns]

# add new rows to existing master
df_result = pd.concat([masterDf, rows_to_add])

熊猫：尝试合并两个数据框时出现KeyError

2 个答案:

关于错误

您可能想做什么