我的代码部分地将列表输出作为数据帧列写入csv,但此后中断

时间:2018-08-06 03:28:49

标签: python pandas csv

我有一个包含两列的数据集,我想匹配两列中的字符串并在第三列中产生匹配百分比。然后,我想将所有三列都包含在CSV中。这是我的代码。

    Data: 

    **RoS  FCRA**
    pink pinky 
    rose grass 
    thick thin 

代码:

from fuzzywuzzy import fuzz, process
import pandas as pd
import csv

df = pd.read_excel("/Users/shreyaagarwal/Desktop/fcra test.xlsx")
with open("myfile.csv", "w") as fh:
     writer = csv.writer(fh)
     for i in (df["RoS"]):
        for p in (df["FCRA"]):
            s = p.encode('ascii', 'ignore').decode('ascii')
            match = fuzz.partial_ratio(i,s)
            df["Fuzzymatch"] = match
            writer.writerow([i,s,match])



Desired Output: 
    **RoS  FCRA  Match**
    pink pinky 20
    pink grass 0
    pink thin 0
    rose pinky 0
    rose grass 0
    rose thin 0

1 个答案:

答案 0 :(得分:0)

您似乎正在遍历错误的事物并引入从未使用过的变量。我猜你想要类似的东西

from fuzzywuzzy import fuzz, process
import pandas as pd
import csv

df = pd.read_excel("test.xlsx")
with open("myfile.csv", "w") as fh:
    writer = csv.writer(fh)
    for i in df["RoS"]:
        for p in df["FCRA"]:
            match = fuzz.partial_ratio(i,p)
            writer.writerow([i,p,match])

这里是尝试MCVE的地方:

import pandas as pd

df = pd.DataFrame(
    [['pink', 'pinky'], ['rose', 'grass'], ['thick', 'thin']],
    columns=['RoS', 'FCRA'])
for i in df["RoS"]:
    for p in df["FCRA"]:
        print(i, p)

结果:

('pink', 'pinky')
('pink', 'grass')
('pink', 'thin')
('rose', 'pinky')
('rose', 'grass')
('rose', 'thin')
('thick', 'pinky')
('thick', 'grass')
('thick', 'thin')