Question

我试图用字母和数字来分割一些行，但是我无法想出合适的正则表达式。

行的格式类似于unit = value + unit，例如：

width = 3.45cm
height = 2m
width = 2mm
height = 6.67m

我想为每个名称，价值和单位获得单独的输出，这就是我所做的：

line = infoData.readline()
names = []
values = []
units = []
while line:

    if "=" in line:
        names.append(line[0:line.index("=")])
        m = re.search('\d+', line[line.index("="):len(line)])
        values.append(int(m.group()))
        m = re.search('\D+[^=\n\.]', line[line.index("="):len(line)])
        units.append(m.group())
        line = infoData.readline()

    else:
        line = infoData.readline()

我唯一能以理想的方式获得名字......

Answer 1

data = ["width = 3.45cm","height = 2m","width = 2mm","height = 6.67m","nope"]

import re
pattern = re.compile("(\w+)\s*=\s*([\d.]+)\s*(\w+)")
print [pattern.search(items).groups() for items in data if pattern.search(items)]
# [('width', '3.45', 'cm'), ('height', '2', 'm'), ('width', '2', 'mm'),
#  ('height', '6.67', 'm')]

RegEx演示：

Regular expression visualization

Debuggex Demo

编辑：如果您正在寻找一种从RegEx中获取字典的方法，您可以这样做

import re
patt = re.compile("(?P<name>\w+)\s*=\s*(?P<value>[\d.]+)\s*(?P<unit>\w+)")
print [patt.search(items).groupdict() for items in data if patt.search(items)]

<强>输出

[{'name': 'width', 'unit': 'cm', 'value': '3.45'},
 {'name': 'height', 'unit': 'm', 'value': '2'},
 {'name': 'width', 'unit': 'mm', 'value': '2'},
 {'name': 'height', 'unit': 'm', 'value': '6.67'}]

Answer 2

你的事情有些过于复杂。我会用：

data = []

for line in infoData:
    if '=' not in line:
        continue
    name, value = line.split('=')
    value, unit = re.search('([\d.]+)(\w+)', value).groups()

    data.append({'name': name.strip(), 'value': float(value), 'unit': unit})

为您的示例数据提供一个字典列表：

[{'name': 'width', 'unit': 'cm', 'value': 3.45},
 {'name': 'height', 'unit': 'm', 'value': 2.0},
 {'name': 'width', 'unit': 'mm', 'value': 2.0},
 {'name': 'height', 'unit': 'm', 'value': 6.67}]

而不是3个单独的列表。

在python中不正确的regular-expression

2 个答案: