Question

a = {'1330': ('John', 'Gold', '1330'), "0001":('Matt', 'Wade', '0001'), '2112': ('Bob', 'Smith', '2112')}
com = {'6':['John Gold, getting no points', 'Matt played in this game? Didn\'t notice him','Love this shot!']}
comments_table = []

我试图通过这个替换函数实现的是用com（dict）中找到的字符串中的人名取代它们所特有的代码，这些代码可以通过正则表达式在（dict）中找到。用代码替换名称是有效的，但是用代码而不是名字添加新字符串是我出错的地方。

def replace_first_name():
for k,v in a.items():
    for z, y in com.items():
        for item in y:
            firstname = a[k][0]
            lastname = a[k][1]
            full_name = firstname + ' ' + lastname
            if firstname in item:
                if full_name in item:
                    t = re.compile(re.escape(full_name), re.IGNORECASE)
                    comment = t.sub(a[k][2], item)
                    print ('1')
                    comments_table.append({
                        'post_id': z, 'comment': comment
                    })
                    continue

                else:

                    t = re.compile(re.escape(firstname), re.IGNORECASE)
                    comment = t.sub(a[k][2], item)
                    print ('2')
                    comments_table.append({
                        'post_id':z, 'comment':comment
                    })
            else:
                print ('3')
                if fuzz.ratio(item,item) > 90:
                    comments_table.append({
                        'post_id': z, 'comment': item
                    })
                else:
                    pass

问题在于输出如下所示：

[{'comment': '1330, getting no points', 'post_id': '6'}, {'comment': "Matt played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}, {'comment': 'John Gold, getting no points', 'post_id': '6'}, {'comment': "Matt played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}, {'comment': 'John Gold, getting no points', 'post_id': '6'}, {'comment': "0001 played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}]

我不想要已将其姓名替换为号码的评论进入最终列表。因此，我希望我的预期输出看起来像这样：

[{'comment': '1330, getting no points', 'post_id': '6'},{'comment': '0001,played in this game? Didn\'t notice him', 'post_id': '6', {'comment':'Love this shot', 'post_id':'6'}]

我已经研究过使用迭代器，将y设为iter_list，但我没有得到任何结果。任何帮助，将不胜感激。谢谢！

Answer 1

不确定为什么要进行regexp替换，因为您正在检查in是否存在名字/全名。还不确定案例3中fuzz.ratio(item, item)的内容应该是什么，但是这里有你如何做简单/幼稚的替换：

#!/usr/bin/python
import re

def replace_names(authors, com):
    res = []
    for post_id, comments in com.items():
        for comment in comments:
            for author_id, author in authors.items():
                first_name, last_name = author[0], author[1]
                full_name = first_name + ' ' + last_name
                if full_name in comment:
                    comment = comment.replace(full_name, author_id)
                    break
                elif first_name in comment:
                    comment = comment.replace(first_name, author_id)
                    break
            res.append({'post_id': post_id, 'comment': comment})
    return res

a = {'1330': ('John', 'Gold', '1330'), "0001":('Matt', 'Wade', '0001'), '2112': ('Bob', 'Smith', '2112')}
com = {'6':['John Gold, getting no points', 'Matt played in this game? Didn\'t notice him','Love this shot!']}
for comment in replace_names(a, com):
    print comment

产生此输出：

{'comment': '1330, getting no points', 'post_id': '6'}
{'comment': "0001 played in this game? Didn't notice him", 'post_id': '6'}
{'comment': 'Love this shot!', 'post_id': '6'}

理解您对原始代码的意图有点棘手，但是（其中一个）您获得重复的原因是您正在外部循环中处理作者，表示您将为每位作者处理每次评论。通过交换循环，您可以确保每个注释只处理一次。

您可能还希望break拥有continue，但我并不完全确定我了解您的原始代码应如何运作。

使用全局变量也有点令人困惑。

用于循环输出重复项

1 个答案: