这不是询问如何使用re.findall()
或全局修饰符(?g)
或\g
的问题。这是在询问如何将n
组与一个正则表达式匹配,n
在3到5之间。
规则:
#
(注释)ITEM1
,ITEM2
,ITEM3
class ITEM1(stuff)
model = ITEM2
fields = (ITEM3)
write_once_fields = (ITEM4)
required_fields = (ITEM5)
None
,要么检索对。我的问题是,这是否可行,以及如何?
我已经做到了这一点,但它没有处理评论或未知的顺序,或者是否缺少某些项目,并且当您看到下一个class
定义时停止搜索此特定正则表达式。 https://www.regex101.com/r/cG5nV9/8
(?s)\nclass\s(.*?)(?=\()
.*?
model\s=\s(.*?)\n
.*?
(?=fields.*?\((.*?)\))
.*?
(?=write_once_fields.*?\((.*?)\))
.*?
(?=required_fields.*?\((.*?)\))
我需要有条件的吗?
感谢您提供各种提示。
答案 0 :(得分:1)
我做的事情如下:
from collections import defaultdict
import re
comment_line = re.compile(r"\s*#")
matches = defaultdict(dict)
with open('path/to/file.txt') as inf:
d = {} # should catch and dispose of any matching lines
# not related to a class
for line in inf:
if comment_line.match(line):
continue # skip this line
if line.startswith('class '):
classname = line.split()[1]
d = matches[classname]
if line.startswith('model'):
d['model'] = line.split('=')[1].strip()
if line.startswith('fields'):
d['fields'] = line.split('=')[1].strip()
if line.startswith('write_once_fields'):
d['write_once_fields'] = line.split('=')[1].strip()
if line.startswith('required_fields'):
d['required_fields'] = line.split('=')[1].strip()
使用正则表达式匹配,您可以更轻松地完成此任务。
comment_line = re.compile(r"\s*#")
class_line = re.compile(r"class (?P<classname>)")
possible_keys = ["model", "fields", "write_once_fields", "required_fields"]
data_line = re.compile(r"\s*(?P<key>" + "|".join(possible_keys) +
r")\s+=\s+(?P<value>.*)")
with open( ...
d = {} # default catcher as above
for line in ...
if comment_line.match(line):
continue
class_match = class_line.match(line)
if class_match:
d = matches[class_match.group('classname')]
continue # there won't be more than one match per line
data_match = data_line.match(line)
if data_match:
key,value = data_match.group('key'), data_match.group('value')
d[key] = value
但这可能更难理解。 YMMV。