Question

我有一些看起来像这样的数据：

key abc key
value 1
value 2
value 3
key bcd key
value 2
value 3
value 4
...
...

基于它，我想要的是构建一个数据结构，如：

{'abc':[1,2,3]}
{'bcd':[2,3,4]}
...

正则表达式是一个很好的选择吗？如果是这样，如何编写正则表达式，使得进程的行为类似于for循环（在循环内部，我可以用我所获得的数据构建数据结构）？

感谢。

Answer 1

相对于使用字符串切片来识别文本文件中的值，使用正则表达式可能会更加冗长。如果您对数据格式有信心，使用字符串切片就可以了。

import re

keyPat = re.compile(r'key (\w+) key')
valuePat = re.compile(r'value (\d+)')

result = {}
for line in open('data.txt'):
    if keyPat.search(line):
        match = keyPat.search(line).group(1)
        tempL = []
        result[match] = tempL
    elif valuePat.search(line):
        match = valuePat.search(line).group(1)
        tempL.append(int(match))
    else:
        print('Did not match:', line)

print(result)

Answer 2

如果数据始终采用该格式，则以下代码应该有效。

str=""
with open(FILENAME, "r") as f:
    str =f.read()
regex = r'key ([^\s]*) key\nvalue (\d)+\nvalue (\d)+\nvalue (\d+)'
matches=re.findall(regex, str)
dic={}
for match in matches:
    dic[match[0]] = map(int, match[1:])
print dic

编辑：meelo的另一个答案更加强大，因为它处理值可能大于或小于3的情况。

Answer 3

x="""key abc key
value 1
value 2
value 3
key bcd key
value 2
value 3
value 4"""
j= re.findall(r"key (.*?) key\n([\s\S]*?)(?=\nkey|$)",x)
d={}
for i in j:
    k=map(int,re.findall(r"value (.*?)(?=\nvalue|$)",i[1]))
    d[i[0]]=k
print d

python regex构建结构化数据结构

3 个答案: