Question

我有几段看起来像这样的文本：

  steps:
  - class: pipe.steps.extract.Extract
    conf:
      unzip_patterns:
      - .*EstimatesDaily_RealEstate_Q.*_{FD_YYYYMMDD}.*
    id: extract
  - class: pipe.steps.validate.Validate
    conf:
      schema_def:
        fields:

我要用以下内容替换这段文本：

  global:
    global:
      schema_def:
        fields:

这里的要点是，文本在每个文本文件中都跨越了几行。不确定如何解决此问题。更麻烦的是，它并不总是具有“ - .*EstimatesDaily_RealEstate_Q.*_{FD_YYYYMMDD}.*”。有时文本是'- .*EstimatesDaily_RealEstate_Y.*_{FD_YYYYMMDD}.*'或可能是'- .*EstimatesDaily_RealEstate_EAP_Nav.*_{FD_YYYYMMDD}.*'在每个块中总是相同的是它以'steps:'开头并以''结尾fields:'。

我的示例代码如下：

import glob
import re
path = 'C:\\Users\\ryans\\OneDrive\\Desktop\\output\\*.yaml'
regex = re.compile("steps:.*fields:", re.DOTALL)
print(regex)
replace = """global:
global:
  schema_def:
    fields:"""
for fname in glob.glob(path):
    #print(str(fname))
    with open(fname, 'r+') as f:
        text = re.sub(regex, replace, '')
        f.seek(0)
        f.write(text)
        f.truncate()

当然，我的例子并不简单。

Answer 1

正则表达式可能是最好的答案。将使这个简单。您的里程会因我的正则表达式示例而异。使其紧紧需要，以确保只更换所需的东西，而不会得到误报。

import re

#re.DOTALL means it matches across newlines!    
regex = re.compile("steps:.*?fields:", flags=re.DOTALL, count=1) 

replace = """global:
global:
  schema_def:
    fields:"""

def do_replace(fname):
    with open(fname, 'r') as f:
        in = f.read()
    with open(fname, 'w') as f:
        f.write(re.sub(regex, replace, in))

for fname in glob.glob(path):
    print(str(fname))
    do_replace(fname)

Answer 2

由于您正在对字符串之间的内容进行一般替换，因此我想说这需要一个正则表达式[编辑：抱歉，我看到您此后已用正则表达式代码替换了字符串“ replace”语句]。因此，如果您的文件是“ myfile.txt”，请尝试以下操作：

此处的输出应为“ myfile.txt”的原始内容，并包含所有替换内容。

Python中的常规约定不是直接编辑文件，而是仅复制文件中所需的内容，进行更改，然后将所有内容写回到新文件中。这种方式不太容易出错，除非您处理的是天文数字的大量内容，否则它应该很好。因此，您可以用以下内容代替我在这里的最后一行：

>>> import re
>>> f = open('myfile.txt', 'r')
>>> content = f.read()
>>> f.close()
>>> replacement = ' global:\n   global:\n     schema_def:\n       fields:'
>>> print re.sub(r"(\ssteps\:)(.*?)(\sfields\:)", replacement, content, flags=re.DOTALL)

尝试在多个文本文件中的多行中查找/替换

2 个答案: