Question

我有两个excel文件，我将其命名为： source.xlsx ， output.xlsx 。

我需要使用 source.xlsx 的来电显示列与 output.xlsx的 svc_no 列匹配数据

如果没有匹配或使用来电显示列的值为“空”，我可以使用 source.xlsx <的 adsl 列/ strong>与 output.xlsx 的 port 列匹配。

如果匹配，那么我应该忽略端口并写下来电显示

source.xlsx 中的数据如下所示：

Caller ID adsl Comparison Result NULL 2/12 Not Match 11111111 2/267 Match 22222222 4/243 Match 22222222 2/117 Possible Match

output.xlsx 中的数据如下所示：

svc_no Caller ID port Comparison Result 22222222 4/243 11111111 2/267 22222222 2/117 NULL 2/12

我的预期输出是将 source.xlsx 的数据写入 output.xlsx ：

svc_no Caller ID port Comparison Result 22222222 22222222 4/243 Match 11111111 11111111 2/267 Match 22222222 22222222 2/117 Possible Match NULL NULL 2/12 Not Match

我尝试使用：

df = read_excel('source.xlsx') df1 = read_excel('output.xlsx') df = df['Caller ID'].isin(df1['svc_no'])] df['Caller ID'] = df1['Caller ID'] df1.to_excel('output.xlsx')

但它不匹配并随机写。

Answer 1

这是一种方式。

# filter output for 2 pre-populated columns
output = output[['svc_no', 'port']]

# add duplicate column
output['Caller ID'] = output['svc_no']

# create series mapping from source
s = source.set_index(['Caller ID', 'adsl'])['Comparison Result']

# map series to output
output['Comparison Result'] = output.set_index(['svc_no','port']).index.map(s.get)

print(output)

        svc_no   port    Caller ID Comparison Result
0  2.22222e+07  4/243  2.22222e+07             Match
1  1.11111e+07  2/267  1.11111e+07             Match
2  2.22222e+07  2/117  2.22222e+07     PossibleMatch
3         NULL   2/12         NULL          NotMatch

匹配并从一个Excel文件获取值到pandas中的另一个Excel文件

1 个答案: