Question

我发现很难总结我的问题，所以我将从一个例子开始。我有一个textarea，其中的每一行必须验证以下模式：

{new_field} is {func} of {field}[,{field}]

is和of是固定字词，{new_field}和{field}是变量字词，需要以某种方式返回[和{{]之间的内容1}}是可选的。我需要返回一个dicts列表，每个dicts包含从textarea中每行提取的变量术语。

因此，例如，如果我有以下输入：

name is concat of first_name, last_name
price is sum of product, taxes, shipping

我需要输出：

[{'new_field': 'name', 'func': 'concat', 'fields': ['first_name', 'last_name']},
 {'new_field': 'price', 'func': 'sum', 'fields': ['product', 'taxes', 'shipping']}]

现在，我想到split整行，并使用索引来匹配术语，但如果我需要自定义占位符的外观，我将很难做到这一点。然后，我想到了使用正则表达式，但遗憾的是我不知道如何从re模块开始/使用什么。任何帮助和提示都将受到高度赞赏！

Answer 1

类似的东西：

s = """name is concat of first_name, last_name
price is sum of product, taxes, shipping"""

out = []

for line in s.splitlines():
    new_field,func,fields = re.match(r'(\w+) is (\w+) of (.*)',line).groups()
    out.append({'new_field':new_field,
                'func':func,
                'fields':fields.split(',')})

输出：

out
Out[20]: 
[{'fields': ['first_name', ' last_name'],
  'func': 'concat',
  'new_field': 'name'},
 {'fields': ['product', ' taxes', ' shipping'],
  'func': 'sum',
  'new_field': 'price'}]

请注意，我对上述内容非常简洁，这对于演示代码很有用，但如果您期望稳健性，则不是很好。至少你需要检查match is not None是否可以在fields上进行更复杂的解析，以确保它与你指定的语法相匹配。 a la

for line in s.splitlines():
    match = re.match(r'(\w+) is (\w+) of (.*)',line)
    if match:
        new_field,func,fields = match.groups()
        out.append({'new_field': new_field,
                    'func': func,
                    'fields': some_processing_func(fields)})

Answer 2

简单的方法是：

import re

text = ['name is concat of first_name, last_name',
'price is sum of product, taxes, shipping']

pattern = "(\w+)\s+is\s+(\w+)\s+of\s+(\w+)\s?(.*)"

res = []
for line in text:
    m = re.match(pattern,line)      
    res.append({
         'new_field': m.group(1),
         'func': m.group(2),
         'fields': [x.strip() for x in m.groups()[-1].split(',') if x]
         })
print res

匹配模式上的多个行并返回占位符

2 个答案: