Question

我有两个可以轻松找到的正则表达式模式，但是我不知道如何轻松解析。

首先，我想捕获“ table_name：snl_realestate_hb_na_fundamentals1_o”（这是一个变量），但是删除“ id：load”

在第二种情况下，我想捕获“ unzip_patterns：”和“ id：extract”之间的界线。这也是一个变量。

因此，基本上，我试图遍历文本文件的各行，找到这两个变量，将它们连接成字符串，然后更新文本文件。我有这些模式的多个实例和多个文本文件，因此，我真的想实现此过程的自动化，而不是花大量时间手动进行此操作。这是我的示例代码。

# transform data sets
import glob
import re
path = 'C:\\my_path\\*.yaml'

for fname in glob.glob(path):
    with open(fname, 'r') as f:
        sfile = f.read()

        for line in sfile:

            # get table name
            findtable = 'table_name:.*?id: load'
            replacetable = ''

            # get unzip patterns
            findunzip = 'unzip_patterns:.*?id: extract'
            replaceunzip = ''

            # drop standardize
            findstd = '- class: pipe.standardize.Standardize.*?id: load'

            concat = '''  steps: \n
            - id: extract \n''' 
            + replaceunzip + '''\n'''
             ''' - id: validate \n
            conf: ''' + '''\n'''
            + replacetable

            text = re.sub(findstd, concat, sfile)

        f = open(fname,'w')
        f.write(text)
        f.close()

尝试使用正则表达式模式查找子字符串

0 个答案: