我的目标是基于相似的主机名,序列号和类别,将4个excel工作表合并为1个。我正在使用下面的pandas合并功能。
??
问题在于,每个工作表都有一个“ IP地址”列,其中大多数IP都是相似的。由于某种原因,合并数据帧包含4列,具有2个重复的名称:“ IP地址_x”,“ IP地址_x”,“ IP地址_y”,“ IP地址_y”
我想将这4列合并为1,但是我不能,因为它们具有相似的名称。我没有手动重命名它们,因为有大约30个数据框列,而且很乏味。
有没有可以合并它们的说法:
这是工作表的示例,我还有更多列,例如:名称,网址,站点名称,城市...
InventoryDf
int.MaxValue
HardwareDf
abc
SoftwareDf
.386
.model flat, stdcall
.stack 4096
ExitProcess PROTO, dwExitCode:DWORD
INCLUDE Irvine32.inc
.data
msg db "Hello again, World!",0
.code
main Proc
INVOKE ExitProcess, 0
main ENDP
END main
CoverageDf
InventoryDf = pd.read_excel("Inventory.xlsx", sheet_name='Inventory')
SoftwareDf = pd.read_excel("Inventory.xlsx", sheet_name='Software')
HardwarewareDf = pd.read_excel("Inventory.xlsx", sheet_name='Hardware')
CoverageDf = pd.read_excel("Inventory.xlsx", sheet_name='Coverage')
data_frames = [InventoryDf, SoftwareDf, HardwarewareDf, CoverageDf]
merge = partial(pd.merge, on=['Priority','Category','Product Family','Host Name','Serial Number'], how='outer')
merge = reduce(merge, data_frames)
预期结果(即使SwitchA的IP地址不同,IP地址也会合并)
+-----------+---------------+------------+----------+----------+
| Host Name | Serial Number | IP Address | Priority | Category |
+-----------+---------------+------------+----------+----------+
| SwitchA | 1230 | 1.1.1.1 | 1 | Switch |
+-----------+---------------+------------+----------+----------+
| SwitchA | 1231 | 1.1.1.1 | 1 | Switch |
+-----------+---------------+------------+----------+----------+
| SwitchB | 1240 | 1.1.1.2 | 2 | Switch |
+-----------+---------------+------------+----------+----------+
原始结果摘录。注意丢失冗余列IP Address_x
+-----------+---------------+------------+----------+----------+
| Host Name | Serial Number | IP Address | Priority | Category |
+-----------+---------------+------------+----------+----------+
| SwitchA | 1230 | 1.1.0.1 | 1 | Switch |
+-----------+---------------+------------+----------+----------+
| SwitchD | 1250 | 1.2.2.2 | 1 | Switch |
+-----------+---------------+------------+----------+----------+
| SwitchE | 1260 | 1.3.3.3 | 2 | Switch |
+-----------+---------------+------------+----------+----------+
答案 0 :(得分:1)
从使用functools
的高级技术开始。将inspect
添加到组合get variable name
fillna()
并将其删除import inspect
import functools
def retrieve_name(var):
callers_local_vars = inspect.currentframe().f_back.f_locals.items()
return [var_name for var_name, var_val in callers_local_vars if var_val is var]
data_frames = [InventoryDf, SoftwareDf, HardwareDf, CoverageDf]
names = []
for df in data_frames:
n = retrieve_name(df)[1].replace("Df", "")
names.append(n)
df.columns = [f"{n} {c}" if c=="IP Address" else c for c in df.columns]
# merge = functools.partial(pd.merge, on=['Priority','Category','Product Family','Host Name','Serial Number'], how='outer')
merge = functools.partial(pd.merge, on=['Priority','Category','Host Name','Serial Number'], how='outer')
merge = functools.reduce(merge, data_frames)
# take column LHS IP Address and rename it to "IP Address", fillna() from all subsequent columns
# then drop them
merge.rename(columns={f"{names[0]} IP Address":"IP Address"}, inplace=True)
for n in names[1:]:
merge.loc[:,"IP Address"].fillna(merge.loc[:,f"{n} IP Address"], inplace=True)
merge.drop(columns=f"{n} IP Address", inplace=True)