如何使用带有两个csv的FuzzyWuzzy?

时间:2019-05-07 03:49:38

标签: python python-3.x fuzzywuzzy

我正在尝试比较包含职位的两个csv。一个csv包含来自美国劳工统计局的职位,另一个csv包含手动生成的职位列表。每个列表中大约有2000个职位。我是一个初学者,因此我的方法很可能会遇到一些明显的基本问题。提前致歉。

我能够获取所有预测的作业值,但由于某些原因,它们仅与第一个bls_job值进行比较。


from fuzzywuzzy import fuzz

bls_job_list = open("bls_jobs.csv", "r")
predicted_job_list = open("predicted_jobs.csv", "r")

for bls_job in bls_job_list.readlines():
    for predicted_job in predicted_job_list.readlines():
        print(bls_job + "," + predicted_job + "," + str((fuzz.partial_ratio(bls_job, predicted_job))) + "\n")

bls_job_list.close()
predicted_job_list.close()

我希望能够比较两个列表中所有值的FuzzyRatio值。

输入 _bls_sample:_

admiral, ceo, chief executive officer, chief financial officer, chief operating officer, chief sustainability officer, commissioner of internal revenue, coo, county commissioner, government service, executive governor, mayor, school superintendent, university president,

_predicted_sample:_

abstractor, accessioner, account coordinator, account executive, account manager, account representative, account service representative, account specialist, accountant, accounting clerk, accounting manager, accounting supervisor, accounts manager,

下面是我当前输出的示例:

BLS_job_1 分析员 ,25

BLS_job_1 分析经理 ,25

BLS_job_1 ,救护车司机 ,33

BLS_job_1 ,工人 ,27

1 个答案:

答案 0 :(得分:0)

我相信您在for循环中使用了生成器,这可能就是原因。我列出了您的工作,并逐一遍历了每个元素以进行fuzzywuzzy比较。以下是这样的尝试:

from fuzzywuzzy import fuzz

bls_job_list = open("/russellb/data/py_devel/SO_answrs/input.csv", "r")
predicted_job_list = open("/russellb/data/py_devel/SO_answrs/compare.csv", "r")

bls_job_filtered = [line.replace('\r', '') for line in bls_job_list]
predicted_job_filtered = [line.replace('\r','') for line in predicted_job_list]


for idx, bls_job in enumerate(bls_job_filtered):
    for idw, predicted_job in enumerate(predicted_job_filtered):
        print(bls_job + "," + predicted_job + "," + str((fuzz.partial_ratio(bls_job, predicted_job))) + "\n")

使用上述代码的输出为:

admiral,
,abstractor,
,44

admiral,
,accessioner,
,50

admiral,
,account coordinator,
,50

admiral,
,account executive,
,35

admiral,
,account manager,
,50

admiral,
,account representative,
,47

admiral,
,account service representative,
,44

admiral,
,account specialist,
,56

admiral,
,accountant,
,33

admiral,
,accounting clerk,
,35

admiral,
,accounting manager,
,50

admiral,
,accounting supervisor,
,44

admiral,
,accounts manager,
,50

ceo,
,abstractor,
,60

ceo,
,accessioner,
,60

ceo,
,account coordinator,
,60

ceo,
,account executive,
,60

ceo,
,account manager,
,60

ceo,
,account representative,
,60

ceo,
,account service representative,
,60

ceo,
,account specialist,
,40

ceo,
,accountant,
,40

ceo,
,accounting clerk,
,60
...
...
...
school superintendent,
,accounting manager,
,41

school superintendent,
,accounting supervisor,
,48

school superintendent,
,accounts manager,
,42

university president,
,abstractor,
,36

university president,
,accessioner,
,48

university president,
,account coordinator,
,43

university president,
,account executive,
,26

university president,
,account manager,
,24

university president,
,account representative,
,57

university president,
,account service representative,
,59

university president,
,account specialist,
,35

university president,
,accountant,
,33

university president,
,accounting clerk,
,28

university president,
,accounting manager,
,25

university president,
,accounting supervisor,
,44

university president,
,accounts manager,
,22