替换文字

时间:2017-02-20 13:08:26

标签: python string match

从文本文件作为输入,我需要替换在输入列表中找到的单词。 输出是相同的文本文件,但找到的单词需要替换为,例如:<repl>matached_word</repl>。 我为此构建了一系列循环,但我不能复制相同的文本文件。我尝试使用20行的字符串文本文件,但输出有数百万个重复的行。

这是一个例子。 输入文本文件可以是:

bucharest sdfsadf
sofia sdf sdf dsf 
vienna etc
etc
can
sdfds
22
rdf

fd
paris
Paris

我尝试的代码是:

# input files
input_file = r"....\input_txt_test.txt"
list_names = ["Bucharest", "bucharest", "vienna", "Paris", "buc"]
out_file = r"....\output_txt_test.txt"

# Perform replacement
with open(out_file, 'w') as outfile:
    with open(input_file, 'r') as f:
        text = f.readlines()
        for line in text:
            line_sp = line.split(" ")
            for name in list_names:
                for word in line_sp:
                    if name in word:
                        strreplace = '''<repl>%s</repl>''' % name
                        repl = line.replace(name, strreplace)
                        outfile.write(repl)
                    else:
                        outfile.write(line)

我期待这个输出:

<repl>bucharest</repl> sdfsadf
sofia sdf sdf dsf 
<repl>vienna</repl> etc
etc
can
sdfds
22
rdf

fd
paris
<repl>Paris</repl>

但这就是我得到的:

bucharest sdfsadf
bucharest sdfsadf
<repl>bucharest</repl> sdfsadf
bucharest sdfsadf
bucharest sdfsadf
bucharest sdfsadf
bucharest sdfsadf
bucharest sdfsadf
<repl>buc</repl>harest sdfsadf
bucharest sdfsadf
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
sofia sdf sdf dsf 
vienna etc
vienna etc
vienna etc
vienna etc
<repl>vienna</repl> etc
vienna etc
vienna etc
vienna etc
vienna etc
vienna etc
etc
etc
etc
etc
etc
can
can
can
can
can
sdfds
sdfds
sdfds
sdfds
sdfds
22
22
22
22
22
rdf
rdf
rdf
rdf
rdf





fd
fd
fd
fd
fd
paris
paris
paris
paris
paris
ParisParisParis<repl>Paris</repl>Paris

此外,我在list_names中有“buc”字符串,但没有字匹配该字符串,并且它仍然被插入到输出文件中。 如何执行此匹配和文件写入?谢谢!

2 个答案:

答案 0 :(得分:2)

在这里,您可以阅读input.txt中的每一行line,如果您在给定list_names中找到某个字词,则将line中的该字词替换为新字词。之后,将line保存到输出文件并继续检查:

# input files
input_file = r"....\input_txt_test.txt"
list_names = ["Bucharest", "bucharest", "vienna", "Paris", "buc"]
out_file = r"....\output_txt_test.txt"

# Perform replacement
with open(out_file, 'w') as outfile:
    with open(input_file, 'r') as f:
        text = f.readlines()
        for line in text:
            line_sp = line.split(" ")
            for word in line_sp:
                if word in list_names:
                    replaced_word = "<repl>{}</repl>".format(word)
                    line = line.replace(word, replaced_word)
            outfile.write(line)

答案 1 :(得分:0)

阅读您的文件并将输入的单词替换为单词..

yourListOfWords = ['a','b','c']
yourFile = open('PATH','r')
newFile = open('PATH_NEW','w')
yourFile.read().splitlines()

for line in yourFile:
    for word in yourListOfWords:
        newline = line.replace(word,'<rep>'+word+'</rep>')
        newFile.writelines(newline +"\n")