字符串:
Person1(has(1, 1) has(2, 2)
has(3, 3)
had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))
我想在has()
中为Person1
选择所有内容,即['1, 1', '2, 2', '3, 3']
。
我尝试使用全局模式标记has\((\d, \d)\)(.|\s)*Person2
,但只返回1, 1
。
答案 0 :(得分:5)
使用re.findall()
函数的解决方案:
import re
s = '''
Person1(has(1, 1) has(2, 2)
has(3, 3)
had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))'''
has_items = re.findall(r'(?<!Person2\()has\(([^()]+)\)', s)
print(has_items)
输出:
['1, 1', '2, 2', '3, 3']
(?<!Person2\()
- 后观负面断言,确保关键has
子字符串前面没有Person2(
([^()]+)
- 包含has
项目的第一个捕获组
要为某个has
grep Person
项,请使用以下 统一 方法和扩展示例:
def grepPersonItems(s, person):
person_items = []
person_group = re.search(r'(' + person + '\(.*?\)\))', s, re.DOTALL)
if person_group:
person_items = re.findall(r'has\(([^()]+)\)', person_group.group())
return person_items
s = '''
Person1(has(1, 1) has(2, 2)
has(3, 3)
had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7), has(8,8)) Person3(has(2, 6) had(7, 7), has(9, 9))'''
print('Person1: ', person1_items)
print('Person2: ', person2_items)
print('Person3: ', person3_items)
print(person1_items)
print(person2_items)
print(person3_items)
输出:
Person1: ['1, 1', '2, 2', '3, 3']
Person2: ['6, 6', '8, 8']
Person3: ['2, 6', '9, 9']
答案 1 :(得分:1)
为什么不完全解析它然后你可以拿起你需要的任何东西 - 你需要两种模式,一种是抓住每个人及其内容,另一种是抓住他们内部的各个部分+你可以添加一些解析来获取单个元素并将它们转换为本机Python类型。类似的东西:
import collections
import re
persons = re.compile(r"(Person\d+)\(((?:.*?\(.*?\)\s*)+)\)")
contents = re.compile(r"(\w+)\((.*?)\)")
def parse_input(data, parse_inner=True, map_inner=str):
result = {} # store for our parsed data
for match in persons.finditer(data): # loop through our `Persons`
person = match.group(1) # grab the first group to get our Person
elements = collections.defaultdict(list) # store for the parsed inner elements
for element in contents.finditer(match.group(2)): # loop through the has/had/etc.
element_name = element.group(1) # the first group holds the name
element_data = element.group(2) # this is the inner content of each has/had/etc.
if parse_inner: # if we want to parse the inner elements...
element_data = [map_inner(x.strip()) for x in element_data.split(",")]
elements[element_name].append(element_data) # add our inner results
result[person] = elements # add persons to our result
return result # well, obvious...
然后,您可以解析所有内容并将其访问到您心中的内容。最基本的例子是:
test = """Person1(has(1, 1) has(2, 2)
has(3, 3)
had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))"""
parsed = parse_input(test, False) # basic string grab
print(parsed["Person1"]["has"]) # ['1, 1', '2, 2', '3, 3']
print(parsed["Person2"]["has"]) # ['6, 6']
print(parsed["Person2"]["had"]) # ['7, 7']
但是你可以做更多的事情......你可以有多个被添加的人并让它被转换为&#39;到实际的Python结构:
test = """Person1(has(1, 1) has(2, 2)
has(3, 3)
had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))
Person3(has(1, 2) has(3, 4) has(4, 5) foo(6, 7))"""
parsed = parse_input(test, True, int) # parses everything and auto-converts to int
print(parsed["Person3"]["has"]) # [[1, 2], [3, 4], [4, 5]]
print(parsed["Person3"]["has"][1]) # [3, 4]
print(sum(parsed["Person3"]["foo"][0])) # 13
print(parsed["Person1"]["has"][1] + parsed["Person2"]["has"][0]) # [2, 2, 6, 6]
# etc.
答案 2 :(得分:0)
我想你可以尝试这种方法,我认为对所有人来说都是动态和简单的。它拆分并解析字符串并将每个所需的数组推送到Person的字典中。
示例来源(run here):
import re
regex = r"has\(\s*(\d+)\s*,\s*(\d+)\s*\)"
dict={}
test_str = ("Person1(has(1, 1) has(2, 2)\n"
" has(3, 3) \n"
" had(4, 4) had(5, 5))\n"
"Person2(had(6, 6) has(7, 7))\n"
"Person3(had(6, 6) has(8, 8))")
res=re.split(r"(Person\d+)",test_str)
currentKey="";
for rs in res:
if "Person" in rs:
currentKey=rs;
elif currentKey !="":
matches = re.finditer(regex, rs, re.DOTALL)
ar=[]
for match in matches:
ar.append(match.group(1)+","+match.group(2))
dict[currentKey]=ar;
print(dict)
输出将是:
{'Person1': ['1,1', '2,2', '3,3'], 'Person2': ['7,7'], 'Person3': ['8,8']}