Question

我有一个文件，其中包含大部分映射到某个数字列表的名称。我想解析这个文件，我认为正则表达式在这里会很好。不过我有两个问题：

名称可以由单词，单词后跟更多单词或带下划线的单词组成。单词还可以包含以下字符：(/->)
在文件的开头和其他一些不应包含在结果中的地方有注释。评论总是一行---后跟一些文字后跟另一行---

所以，如果我有以下文件

-----------------------------------
Comment
-----------------------------------
Ignore these lines
-----------------------------------
someVar                        0.0  1.0
some var with spaces           52   93
another var_with_underscores   3
some var with (special->chars) 13  37  95
another char/slash             132  
-----------------------------------
Another comment
-----------------------------------
yet another var               27.3  9

我想要返回一本字典

{"someVar": [0.0, 1.0], 
 "some var with spaces": [52, 93],
 "another var_with_underscores": [3],
 "another char/slash": [132]  
 "some var with (special->chars)": [13, 37, 95],
 "yet another var": [27.3, 9]}

如果要求提出一个问题太多了，我会很高兴知道正则表达式。

我正在使用Python 2.7。

Answer 1

这可能就是你想要的：

import re

results = {}
with open('example.txt', 'r') as f:
    for line in f.readlines():
        m = re.match(r'([^\d]+)(.*)', line.strip())
        if m and m.group(2):
            results[m.group(1).strip()] = [float(n) for n in m.group(2).split()]

更新示例中的内容为：

{'some var with (special->chars)': [13.0, 37.0, 95.0]
 'another var_with_underscores': [3.0]
 'some var with spaces': [52.0, 93.0]
 'someVar': [0.0, 1.0]
 'another char/slash': [132.0]
 'yet another var': [27.3, 9.0]}

Answer 2

荒谬的单行：

dict((m.group(1),map(float,m.group(2).split())) for m in re.finditer('^(.*?)\s*([ \d\.]+)$',whole_thing,re.M))

正则表达式匹配名称后跟可选空格，但忽略注释

2 个答案: