Question

我正在尝试为以下情况编写正则表达式。我有一个包含数百个词典的文件作为字符串。

EG：

{'a':1'}
{{'a':1, 'b':2}{'c':3}}
{'a':4, 'b':6}

我读了这个文件并removed the newlines。现在我正在尝试split他们based on a regex。

{'a':1'}{{'a':1, 'b':2}{'c':3}}{'a':4, 'b':6}

re.split("({.*?})", str)。这不会起作用，因为整个第二个字典都不匹配。如何编写一个匹配所有行的正则表达式返回字典列表。

Answer 1

Python正则表达式无法自己处理嵌套结构。你必须分别做一些循环或递归。

但是，您在上面评论过每一行都是json响应。为什么不在每一行使用json.loads()。

import json

with open('path_to_file', 'r') as f:
    data = [json.loads(line) for line in f]

data现在是一个词典列表。

Answer 2

你可以这样做：

(\{[^{}]+\})
# look for an opening {
# and anything that is not { or }
# as well as an ending }

在Python中，这将是：

import re
rx = r'(\{[^{}]+\})'
string = "{'a':1'}{{'a':1, 'b':2}{'c':3}}{'a':4, 'b':6}"
matches = re.findall(rx, string)
print matches
# ["{'a':1'}", "{'a':1, 'b':2}", "{'c':3}", "{'a':4, 'b':6}"]

参见 a demo on regex101.com 。

正则表达式在字符串

2 个答案: