Question

我知道如何在python中替换字符串，但我很难让它工作，也许是因为这是一个文本块而不是我要替换的单行。

我有一堆文本文件，在多个地方重复了以下文本块：

                   LIVEBLAH Information Provided By:
                              BLAH ONLINE
          A division of Blahdeblah BlahBlah Information, Inc.

Washington, DC                    New York, NY                  Chicago, IL
Los Angeles, CA                     Miami, FL                    Dallas, TX

          For Additional Information About LIVEBLAH, Call
                           1-800-XXX-XXXX
                 or Visit Us on the World Wide Web at
                       http://www.blahdeblah.com

我想用字符串“start body”

替换此文本块的每一个匹配项

这是我正在尝试的代码：

import os,glob
path = 'files'
key="""
                      LIVEBLAH Information Provided By:
                                   BLAH ONLINE
               A division of Blahdeblah BlahBlah Information, Inc.

Washington, DC                    New York, NY                  Chicago, IL
Los Angeles, CA                     Miami, FL                    Dallas, TX

                For Additional Information About LIVEBLAH, Call
                                1-800-XXX-XXXX
                      or Visit Us on the World Wide Web at
                            http://www.blahdeblah.com"""

for filename in glob.glob(os.path.join(path, '*.txt')):
    with open(filename, 'r') as f:
        # read entire file into file1
        file1 = f.read()

        # replace block of text with proper string
        file1 = file1.replace(key, "start body")

        # write into a new file
        with open(filename+'_new', 'w') as f:
            f.write(file1)

有人能告诉我为什么replace（）方法不能处理文本块吗？我能做些什么才能让它发挥作用？

编辑 - 我尝试了另一种方法：

for filename in glob.glob(os.path.join(path, '*.txt_new_NEW_NEW_BLAH')):
    with open(filename, 'r') as f:
        # read entire file into file1
        file1 = f.read()

        # index() will raise an error if not found
        f1_start = file1.index('LIVEBLAH Information Provided By:')
        f1_end = file1.index('http://www.blahdeblah.com', f1_start)     

        key = file1[f1_start:(f1_end+25)] # 25 is the length of the string 'http://www.blahdeblah.com' 
        file1 = file1.replace(key, '\n'+"start body")

        with open(filename+'_TRIAL', 'w') as f:
            f.write(file1)

这给出了一个奇怪的结果 - 对于一些文件来说它完美无缺。对于其他人，它只用'start body'替换字符串'LIVEBLAH Information提供者：'，但保留文本块的其余部分。对于其他一些人，index（）会引发一个错误，说它无法在文件中找到字符串'LIVEBLAH Information By：：'，即使它显然存在。发生了什么事？

Answer 1

由于标签和换行符会被编码为＆＃39; \ t＆＃39;和＆＃39; \ n＆＃39;或者＆＃39; \ r＆＃39; （取决于用于创建文件的操作系统或文件编辑器），因此我建议您获取文本文件的unicode转储并在replace命令中使用该字符串。否则，您可能会将制表符解释为多个空格，依此类推。

使用python替换文本文件中的多行

1 个答案: