a = {'1330': ('John', 'Gold', '1330'), "0001":('Matt', 'Wade', '0001'), '2112': ('Bob', 'Smith', '2112')}
com = {'6':['John Gold, getting no points', 'Matt played in this game? Didn\'t notice him','Love this shot!']}
comments_table = []
我试图通过这个替换函数实现的是用com(dict)中找到的字符串中的人名取代它们所特有的代码,这些代码可以通过正则表达式在(dict)中找到。用代码替换名称是有效的,但是用代码而不是名字添加新字符串是我出错的地方。
def replace_first_name():
for k,v in a.items():
for z, y in com.items():
for item in y:
firstname = a[k][0]
lastname = a[k][1]
full_name = firstname + ' ' + lastname
if firstname in item:
if full_name in item:
t = re.compile(re.escape(full_name), re.IGNORECASE)
comment = t.sub(a[k][2], item)
print ('1')
comments_table.append({
'post_id': z, 'comment': comment
})
continue
else:
t = re.compile(re.escape(firstname), re.IGNORECASE)
comment = t.sub(a[k][2], item)
print ('2')
comments_table.append({
'post_id':z, 'comment':comment
})
else:
print ('3')
if fuzz.ratio(item,item) > 90:
comments_table.append({
'post_id': z, 'comment': item
})
else:
pass
问题在于输出如下所示:
[{'comment': '1330, getting no points', 'post_id': '6'}, {'comment': "Matt played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}, {'comment': 'John Gold, getting no points', 'post_id': '6'}, {'comment': "Matt played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}, {'comment': 'John Gold, getting no points', 'post_id': '6'}, {'comment': "0001 played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}]
我不想要已将其姓名替换为号码的评论进入最终列表。因此,我希望我的预期输出看起来像这样:
[{'comment': '1330, getting no points', 'post_id': '6'},{'comment': '0001,played in this game? Didn\'t notice him', 'post_id': '6', {'comment':'Love this shot', 'post_id':'6'}]
我已经研究过使用迭代器,将y设为iter_list,但我没有得到任何结果。任何帮助,将不胜感激。谢谢!
答案 0 :(得分:2)
不确定为什么要进行regexp替换,因为您正在检查in
是否存在名字/全名。还不确定案例3中fuzz.ratio(item, item)
的内容应该是什么,但是这里有你如何做简单/幼稚的替换:
#!/usr/bin/python
import re
def replace_names(authors, com):
res = []
for post_id, comments in com.items():
for comment in comments:
for author_id, author in authors.items():
first_name, last_name = author[0], author[1]
full_name = first_name + ' ' + last_name
if full_name in comment:
comment = comment.replace(full_name, author_id)
break
elif first_name in comment:
comment = comment.replace(first_name, author_id)
break
res.append({'post_id': post_id, 'comment': comment})
return res
a = {'1330': ('John', 'Gold', '1330'), "0001":('Matt', 'Wade', '0001'), '2112': ('Bob', 'Smith', '2112')}
com = {'6':['John Gold, getting no points', 'Matt played in this game? Didn\'t notice him','Love this shot!']}
for comment in replace_names(a, com):
print comment
产生此输出:
{'comment': '1330, getting no points', 'post_id': '6'}
{'comment': "0001 played in this game? Didn't notice him", 'post_id': '6'}
{'comment': 'Love this shot!', 'post_id': '6'}
理解您对原始代码的意图有点棘手,但是(其中一个)您获得重复的原因是您正在外部循环中处理作者,表示您将为每位作者处理每次评论。通过交换循环,您可以确保每个注释只处理一次。
您可能还希望break
拥有continue
,但我并不完全确定我了解您的原始代码应如何运作。
使用全局变量也有点令人困惑。