我想根据另一个数据帧中的匹配来过滤熊猫数据帧的数据。挑战在于我想要进行匹配的列可能会发生变化,现在我正在分多个步骤进行匹配。 在下面的代码中,如果 4 列中的值也存在于数据帧 dfA 中,我只过滤数据帧 dfB 的那些行,然后进行模糊匹配。然后我对 2 列进行过滤,然后进行模糊匹配
out = []
dfB = dfB[
(dfB[dfBcol_NP] == dfA[dfAcol_NP]) &
(dfB[dfBcol_GN] == dfA[dfAcol_GN]) &
(dfB[dfBcol_TI] == dfA[dfAcol_TI]) &
(dfB[dfBcol_LO] == dfA[dfAcol_LO])
]
out.append(process.extract(i, dfB[address], scorer=fuzz.token_sort_ratio))
dfB = dfB[
(dfB[dfBcol_NP] == dfA[dfAcol_NP]) &
(dfB[dfBcol_CPC] == dfA[dfAcol_CPC])
]
out.append(process.extract(i, dfB[address], scorer=fuzz.token_sort_ratio))
有人可以帮我把它整合到一个函数中,我可以在其中传递 2 个数据帧和匹配条件并动态进行过滤。类似的东西
out = []
def filterAndfuzzyMatch(i, dfB, dfA, matchLogic):
dfB = dfB[matchLogic]
out.append(process.extract(i, dfB[address], scorer=fuzz.token_sort_ratio))
matchLogic = [
(dfB[dfBcol_NP] == dfA[dfAcol_NP]) &
(dfB[dfBcol_GN] == dfA[dfAcol_GN]) &
(dfB[dfBcol_TI] == dfA[dfAcol_TI]) &
(dfB[dfBcol_LO] == dfA[dfAcol_LO])
]
filterAndfuzzyMatch(i, dfB, dfA, matchLogic)
matchLogic = [
(dfB[dfBcol_NP] == dfA[dfAcol_NP]) &
(dfB[dfBcol_CPC] == dfA[dfAcol_CPC])
]
filterAndfuzzyMatch(i, dfB, dfA, matchLogic)