Question

我正在尝试搜索并替换为python

我要搜索和替换的文件是一个3列制表符分隔文件，其中包含以下示例输入：

dog walk    1
cat walk    2
pigeon  bark    3

我一直使用的代码如下：

####open_file
import codecs
input_file=codecs.open("corpus3_tst","r",encoding="utf-8")
lines=input_file.readlines()
for word in lines:
    words=word.rstrip()

    # define method
def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

# text for replacement
my_text = words
print my_text

# dictionary with  key:values.
# replace values
reps = {'dog':'ANIMAL', 'cat':'ANIMAL', 'pigeon':'ANIMAL'}

# bind the returned text of the method
# to a variable and print it
txt = replace_all(my_text, reps)
print txt

我的问题是它只用ANIMAL替换了最后一个单词，并且它再次重复该行而不替换它。

输出：

pigeon  bark    3
ANIMAL  bark    3

有没有人知道我的脚本出错了？我已经查看了python replace（）的文档，以及stackoverflow上的类似查询，似乎我正在关注文档，所以我不知道我哪里出错了。

Answer 1

在下文中，每次迭代都会覆盖words。在循环之后，words仅包含最后一行。

for word in lines:
    words=word.rstrip()

替换以下行：

lines=input_file.readlines()
for word in lines:
    words=word.rstrip()

使用：

words = input_file.read().rstrip()

使用正则表达式，可以简化程序。

import codecs
import re

with codecs.open("corpus3_tst","r",encoding="utf-8") as f:
    words = f.read().rstrip()
    pattern = r'dog|cat|pigeon'
    #pattern = '|'.join(map(re.escape, ['dog', 'cat', 'pigeon']))
    print re.sub(pattern, 'ANIMAL', words)

用python搜索和替换

1 个答案: