我有一个包含两列的数据集,我想匹配两列中的字符串并在第三列中产生匹配百分比。然后,我想将所有三列都包含在CSV中。这是我的代码。
Data:
**RoS FCRA**
pink pinky
rose grass
thick thin
代码:
from fuzzywuzzy import fuzz, process
import pandas as pd
import csv
df = pd.read_excel("/Users/shreyaagarwal/Desktop/fcra test.xlsx")
with open("myfile.csv", "w") as fh:
writer = csv.writer(fh)
for i in (df["RoS"]):
for p in (df["FCRA"]):
s = p.encode('ascii', 'ignore').decode('ascii')
match = fuzz.partial_ratio(i,s)
df["Fuzzymatch"] = match
writer.writerow([i,s,match])
Desired Output:
**RoS FCRA Match**
pink pinky 20
pink grass 0
pink thin 0
rose pinky 0
rose grass 0
rose thin 0
答案 0 :(得分:0)
您似乎正在遍历错误的事物并引入从未使用过的变量。我猜你想要类似的东西
from fuzzywuzzy import fuzz, process
import pandas as pd
import csv
df = pd.read_excel("test.xlsx")
with open("myfile.csv", "w") as fh:
writer = csv.writer(fh)
for i in df["RoS"]:
for p in df["FCRA"]:
match = fuzz.partial_ratio(i,p)
writer.writerow([i,p,match])
这里是尝试MCVE的地方:
import pandas as pd
df = pd.DataFrame(
[['pink', 'pinky'], ['rose', 'grass'], ['thick', 'thin']],
columns=['RoS', 'FCRA'])
for i in df["RoS"]:
for p in df["FCRA"]:
print(i, p)
结果:
('pink', 'pinky')
('pink', 'grass')
('pink', 'thin')
('rose', 'pinky')
('rose', 'grass')
('rose', 'thin')
('thick', 'pinky')
('thick', 'grass')
('thick', 'thin')