寻找两个数据帧之间的百分比差异。我试过使用fuzzywuzzy,但没有得到相同的预期输出。
假设我有 2 个数据框,每个数据框有 4 列,我想找到这两个数据框之间的匹配百分比。
在执行代码之前发现 dtypes = float64,所以我改变了 dtypes = object 执行代码时出现错误 TypeError: object of type 'float' has no len()
df1
score id_number company_name company_code Amount
200 IN2231D AXN pvt Ltd IN225 2566.7
450 UK654IN Aviva Intl Ltd IN115 3677
650.8 SL1432H Ship Incorporations CZ555 NaN
350 LK0678G Oppo Mobiles pvt ltd PQ795 367.9
590 NG5678J Nokia Inc RS885 867
250 IN2231D AXN pvt Ltd IN215 785.65
df2
QR_score Identity_No comp_name comp_code amt match_acc
200.00 IN2231D AXN pvt Inc IN225 2566.70
420.0 UK655IN Aviva Intl Ltd IN315 3677.00
350.35 SL2252H Ship Inc CK555 NaN
450.00 LK9978G Oppo Mobiles pvt ltd PRS95 367.9
590.15 NG5678J Nokia Inc RS885 867
250.0 IN5531D AXN pvt Ltd IN215 785.65
当检查 df2['QR_score'] 和 df2['amt'] 的 dtype 为 float64 时,我已将其更改为 Object
我正在尝试的代码
import numpy as np
import pandas as pd
from fuzzywuzzy import fuzz
df2 = df2[['QR_score','amt']].astype(str)
# Make Column Names Match
df1.columns = df2.columns
# Select string (object) columns
t1 = df1.select_dtypes(include='object')
t2 = df2.select_dtypes(include='object')
# Apply fuzz.ratio to every cell of both frames
obj_similarity = pd.DataFrame(np.vectorize(fuzz.ratio)(t1, t2),
columns=t1.columns,
index=t1.index)
# Use non-object similarity with eq
other_similarity = df1.select_dtypes(exclude='object').eq(
df2.select_dtypes(exclude='object')) * 100
# Merge Similarities together and take the average per row
total_similarity = pd.concat((
obj_similarity, other_similarity
), axis=1).mean(axis=1)
df2['match_acc'] = total_similarity
<块引用>
在执行以下行时出现错误:
obj_similarity = pd.DataFrame(np.vectorize(fuzz.ratio)(t1, t2),
columns=t1.columns,
index=t1.index)
Error:TypeError: object of type 'float' has no len()
请提出建议。
答案 0 :(得分:1)
Stack
数据框 concat
它们和 apply
fuzz
(轴 = 1)。然后重组使用unstack
,最后取mean
(axis = 1)。
df2['match_acc'] = pd.concat([df1.stack(), df2.stack()], 1).apply(
lambda x: fuzz.ratio(str(x[0]), str(x[1])), 1).unstack().mean(1)
QR_score Identity_No comp_name comp_code amt match_acc
0 200.00 IN2231D AXN pvt Inc IN225 2566.70 94.60
1 420.00 UK655IN Aviva Intl Ltd IN315 3677.00 89.20
2 350.35 SL2252H Ship Inc CK555 NaN 62.75
3 450.00 LK9978G Oppo Mobiles pvt ltd PRS95 367.90 82.20
4 590.15 NG5678J Nokia Inc RS885 867.00 94.60
5 250.00 IN5531D AXN pvt Ltd IN215 785.65 94.20