删除一些单词替换txt文件中的其他单词

时间:2013-08-20 02:58:16

标签: python string file-io replace

我有一个包含多行文字的txt文件(myText.txt)。

我想知道:

  • 如何创建需要删除的单词列表(我想自己设置单词)
  • 如何创建需要替换的单词列表

例如,如果myText.txt是:

    The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month. 
  • 我想删除“the”和“in”我想替换 “古老的”由“老”
  • 我想取代“月”和“世纪” 通过“年”

3 个答案:

答案 0 :(得分:3)

你总是可以使用正则表达式:

import re

st='''\
The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.'''

deletions=('and','in','the')
repl={"ancient": "old", "month":"years", "centuries":"years"}

tgt='|'.join(r'\b{}\b'.format(e) for e in deletions)
st=re.sub(tgt,'',st)
for word in repl:
    tgt=r'\b{}\b'.format(word)
    st=re.sub(tgt,repl[word],st)


print st

答案 1 :(得分:2)

这应该可以解决问题。使用列表存储要删除的对象,然后遍历列表并从内容字符串中删除列表中的每个元素。然后,您使用字典存储您现在拥有的单词以及要替换它们的单词。你也可以遍历这些并用替换词替换当前的词。

def replace():
    contents = ""
    deleteWords = ["the ", "and ", "in "]
    replaceWords = {"ancient": "old", "month":"years", "centuries":"years"}

    with open("meText.txt") as f:
    contents = f.read()
    for word in deleteWords:
    contents = contents.replace(word,"")

    for key, value in replaceWords.iteritems():
    contents = contents.replace(key, value)
    return contents

答案 2 :(得分:2)

使用列表进行删除,使用字典进行替换。看起来应该是这样的:

 def processTextFile(filename_in, filename_out, delWords, repWords):


    with open(filename_in, "r") as sourcefile:
        for line in sourcefile:
            for item in delWords:
                line = line.replace(item, "")
            for key,value in repWords.items():
                line = line.replace(key,value)

            with open(filename_out, "a") as outfile:
                outfile.write(line)



if __name__ == "__main__":
    delWords = []
    repWords = {}

    delWords.extend(["the ", "and ", "in "])
    repWords["ancient"] = "old"
    repWords["month"] = "years"
    repWords["centuries"] = "years"

    processTextFile("myText.txt", "myOutText.txt", delWords, repWords)

请注意,这是为Python 3.3.2编写的,这就是我使用items()的原因。如果使用Python 2.x,请使用iteritems(),因为我认为它对于大型文本文件更有效。