Question

我有一个包含多行文字的txt文件（myText.txt）。

我想知道：

如何创建需要删除的单词列表（我想自己设置单词）
如何创建需要替换的单词列表

例如，如果myText.txt是：

    The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.

我想删除“the”和“in”我想替换 “古老的”由“老”
我想取代“月”和“世纪” 通过“年”

Answer 1

你总是可以使用正则表达式：

import re

st='''\
The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.'''

deletions=('and','in','the')
repl={"ancient": "old", "month":"years", "centuries":"years"}

tgt='|'.join(r'\b{}\b'.format(e) for e in deletions)
st=re.sub(tgt,'',st)
for word in repl:
    tgt=r'\b{}\b'.format(word)
    st=re.sub(tgt,repl[word],st)


print st

Answer 2

这应该可以解决问题。使用列表存储要删除的对象，然后遍历列表并从内容字符串中删除列表中的每个元素。然后，您使用字典存储您现在拥有的单词以及要替换它们的单词。你也可以遍历这些并用替换词替换当前的词。

def replace():
    contents = ""
    deleteWords = ["the ", "and ", "in "]
    replaceWords = {"ancient": "old", "month":"years", "centuries":"years"}

    with open("meText.txt") as f:
    contents = f.read()
    for word in deleteWords:
    contents = contents.replace(word,"")

    for key, value in replaceWords.iteritems():
    contents = contents.replace(key, value)
    return contents

Answer 3

使用列表进行删除，使用字典进行替换。看起来应该是这样的：

 def processTextFile(filename_in, filename_out, delWords, repWords):


    with open(filename_in, "r") as sourcefile:
        for line in sourcefile:
            for item in delWords:
                line = line.replace(item, "")
            for key,value in repWords.items():
                line = line.replace(key,value)

            with open(filename_out, "a") as outfile:
                outfile.write(line)



if __name__ == "__main__":
    delWords = []
    repWords = {}

    delWords.extend(["the ", "and ", "in "])
    repWords["ancient"] = "old"
    repWords["month"] = "years"
    repWords["centuries"] = "years"

    processTextFile("myText.txt", "myOutText.txt", delWords, repWords)

请注意，这是为Python 3.3.2编写的，这就是我使用items（）的原因。如果使用Python 2.x，请使用iteritems（），因为我认为它对于大型文本文件更有效。

删除一些单词替换txt文件中的其他单词

3 个答案: