我有两个Excel工作表(主数据和输入数据),它们的索引列相同,但列数不同(请参见下文)。我想将输入DF合并到主DF中如果添加了新行(请参阅ID 103-105),或者输入DF中的一项已更新(请参见ID 102)。其他列可以忽略。
数据框1(主):
数据框2(输入):
目标(更新的单元格标记为黄色):
我正在使用以下脚本:
inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")
# Update existing rows
masterDf.update(inputDf)
# find out which ids are new
ids_of_new_rows = set(inputDf.index) - set(masterDf.index)
# get new rows that should be added to master
rows_to_add = masterDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns]
我能够更新Master DF并获得ids_of_new_rows
。输出:
{'CR103', 'CR104', 'CR105'}
但是,当尝试获取rows_to_add
时,我总是收到以下错误:
KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['CR103', 'CR104', 'CR105'], dtype='object', name='ID')] are in the [index]"
有什么想法吗?
答案 0 :(得分:1)
该错误来自以下事实:ID
中没有['CR103', 'CR104', 'CR105']
中masterDf
的行,而inputDf
中却没有rows_to_add = inputDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns]
。您正在尝试做的可能是
inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")
# Update existing rows
masterDf.update(inputDf)
# Add new rows
masterDf = pd.concat((masterDf, inputDf.loc[inputDf.index.difference(masterDf.index), inputDf.columns & masterDf.columns]))
inputDf
此处Index.difference用于获取masterDf
中不存在的btns = self.toolBox.findChildren(QtWidgets.QAbstractButton)
btns = [btn for btn in btns if btn.metaObject().className() == "QToolBoxButton"]
for i, btn in enumerate(btns):
# here you can check e.g. for i
color = QtCore.Qt.red # or any other
p = btn.palette()
p.setColor(QtGui.QPalette.Button, color)
btn.setPalette(p)
中的索引值。
答案 1 :(得分:0)
这是实现以下结果的正确脚本。简单的解决方案是更改inputDF和masterDF ...
# Define DataFrame
inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")
# Update existing rows
masterDf.update(inputDf)
# find out which ids are new
ids_of_new_rows = set(inputDf.index) - set(masterDf.index)
# get new rows that should be added to master
rows_to_add = inputDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns]
# add new rows to existing master
df_result = pd.concat([masterDf, rows_to_add])