使用后只返回一个匹配项(。| \ s)*

时间:2017-07-09 07:37:35

标签: python regex

字符串:

Person1(has(1, 1) has(2, 2)
    has(3, 3) 
    had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))

我想在has()中为Person1选择所有内容,即['1, 1', '2, 2', '3, 3']

我尝试使用全局模式标记has\((\d, \d)\)(.|\s)*Person2,但只返回1, 1

3 个答案:

答案 0 :(得分:5)

使用re.findall()函数的解决方案:

import re

s = '''
Person1(has(1, 1) has(2, 2)
    has(3, 3)
    had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))'''

has_items = re.findall(r'(?<!Person2\()has\(([^()]+)\)', s)
print(has_items)

输出:

['1, 1', '2, 2', '3, 3']
  • (?<!Person2\() - 后观负面断言,确保关键has子字符串前面没有Person2(

  • ([^()]+) - 包含has项目的第一个捕获组

要为某个has grep Person项,请使用以下 统一 方法和扩展示例:

def grepPersonItems(s, person):
    person_items = []
    person_group = re.search(r'(' + person + '\(.*?\)\))', s, re.DOTALL)

    if person_group:
        person_items = re.findall(r'has\(([^()]+)\)', person_group.group())
    return person_items

s = '''
Person1(has(1, 1) has(2, 2)
    has(3, 3)
    had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7), has(8,8)) Person3(has(2, 6) had(7, 7), has(9, 9))'''

print('Person1: ', person1_items)
print('Person2: ', person2_items)
print('Person3: ', person3_items)

print(person1_items)
print(person2_items)
print(person3_items)

输出:

Person1:  ['1, 1', '2, 2', '3, 3']
Person2:  ['6, 6', '8, 8']
Person3:  ['2, 6', '9, 9']

答案 1 :(得分:1)

为什么不完全解析它然后你可以拿起你需要的任何东西 - 你需要两种模式,一种是抓住每个人及其内容,另一种是抓住他们内部的各个部分+你可以添加一些解析来获取单个元素并将它们转换为本机Python类型。类似的东西:

import collections
import re

persons = re.compile(r"(Person\d+)\(((?:.*?\(.*?\)\s*)+)\)")
contents = re.compile(r"(\w+)\((.*?)\)")

def parse_input(data, parse_inner=True, map_inner=str):
    result = {}  # store for our parsed data
    for match in persons.finditer(data):  # loop through our `Persons`
        person = match.group(1)  # grab the first group to get our Person
        elements = collections.defaultdict(list)  # store for the parsed inner elements
        for element in contents.finditer(match.group(2)):  # loop through the has/had/etc.
            element_name = element.group(1)  # the first group holds the name
            element_data = element.group(2)  # this is the inner content of each has/had/etc.
            if parse_inner:  # if we want to parse the inner elements...
                element_data = [map_inner(x.strip()) for x in element_data.split(",")]
            elements[element_name].append(element_data)  # add our inner results
        result[person] = elements  # add persons to our result
    return result  # well, obvious...

然后,您可以解析所有内容并将其访问到您心中的内容。最基本的例子是:

test = """Person1(has(1, 1) has(2, 2)
    has(3, 3)
    had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))"""

parsed = parse_input(test, False)  # basic string grab

print(parsed["Person1"]["has"])  # ['1, 1', '2, 2', '3, 3']
print(parsed["Person2"]["has"])  # ['6, 6']
print(parsed["Person2"]["had"])  # ['7, 7']

但是你可以做更多的事情......你可以有多个被添加的人并让它被转换为&#39;到实际的Python结构:

test = """Person1(has(1, 1) has(2, 2)
    has(3, 3)
    had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))
Person3(has(1, 2) has(3, 4) has(4, 5) foo(6, 7))"""

parsed = parse_input(test, True, int)  # parses everything and auto-converts to int

print(parsed["Person3"]["has"])  # [[1, 2], [3, 4], [4, 5]]
print(parsed["Person3"]["has"][1])  # [3, 4]
print(sum(parsed["Person3"]["foo"][0]))  # 13
print(parsed["Person1"]["has"][1] + parsed["Person2"]["has"][0])  # [2, 2, 6, 6]
# etc.

答案 2 :(得分:0)

我想你可以尝试这种方法,我认为对所有人来说都是动态和简单的。它拆分并解析字符串并将每个所需的数组推送到Person的字典中。

示例来源(run here):

import re

regex = r"has\(\s*(\d+)\s*,\s*(\d+)\s*\)"

dict={}
test_str = ("Person1(has(1, 1) has(2, 2)\n"
    "    has(3, 3) \n"
    "    had(4, 4) had(5, 5))\n"
    "Person2(had(6, 6) has(7, 7))\n"
    "Person3(had(6, 6) has(8, 8))")

res=re.split(r"(Person\d+)",test_str)
currentKey="";
for rs in res:
    if "Person" in rs:
        currentKey=rs;
    elif currentKey !="":
        matches = re.finditer(regex, rs, re.DOTALL)
        ar=[]
        for match in matches:
            ar.append(match.group(1)+","+match.group(2))
        dict[currentKey]=ar;
print(dict)

输出将是:

{'Person1': ['1,1', '2,2', '3,3'], 'Person2': ['7,7'], 'Person3': ['8,8']}