Question

我有一个包含变量名称和值的字符串。名称和值之间没有指定的分隔符，名称可能包含也可能不包含下划线。

string1 = 'Height_A_B132width_top100.0lengthsimple0.00001'

我想把变量放到字典中：

# desired output: dict1 = {'Height_A_B': 132, 'width_top': 100.0, 'lengthsimple': 0.00001}

尝试以下itertools方法

输入1：

from itertools import groupby
[''.join(g) for _, g in groupby(string1, str.isdigit)]

输出1：

['Height_A_B', '132', 'width_top', '100', '.', '0', 'lengthsimple', '0', '.', '00001']

以下几乎应该到达那里，但是iPython解释器告诉我这个str属性不存在（它在文档中）。总之...

输入2：

[''.join(g) for _, g in groupby(string1, str.isnumeric)]

输出2：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-25-cf931a137f50> in <module>()
----> 1 [''.join(g) for _, g in groupby(string1, str.isnumeric)]

AttributeError: type object 'str' has no attribute 'isnumeric'

无论如何，如果数字包含带有＆＃39; +＆＃39;的指数，会发生什么？或者＆＃39; - ＆＃39;符号

string2 = 'Height_A132width_top100.0lengthsimple1.34e+003'
# desired output: dict2 = {'Height_A_B': 132, 'width_top': 100.0, 'lengthsimple': 1.34e+003}

输入3：

[''.join(g) for _, g in groupby(string2, str.isdigit)]

输出3：

['Height_A', '132', 'width_top', '100', '.', '0', 'lengthsimple', '1', '.', '34', 'e+', '003']

我想知道，如果有人有一个优雅的解决方案吗？

更新下面有一些关于保留数值变量类型的讨论（例如int，float等）。事实上，string2中的科学符号原来是一个红色的鲱鱼，因为如果你创建一个变量

>>> a = 1.34e+003

你得到了

>>> print a
1340.0

无论如何，因此产生1.34 + 003的字符串的可能性很低。

因此，如果我们将其更改为

，则string2是更合适的测试用例

string2 = 'Height_A132width_top100.0lengthsimple1.34e+99'

Answer 1

您可以使用正则表达式：([^\d.]+)(\d[\d.e+-]*)：

[^\d.]表示：除数字和句点之外的所有内容
+表示一个或多个。
其他小组至少需要一个数字，然后是数字或e或 - / +。

组1是关键，组2是值。

demo

代码：

import re
vals = { x:float(y) if '.' in y else int(y) for (x,y) in (re.findall(r'([^\d.]+)(\d[\d.e+-]*)',string2))} 

{'width_top': 100.0, 'Height_A': 132, 'lengthsimple': 1340.0}

Answer 2

使用科学记数法处理数字会使这有点棘手，但是可以使用精心编写的正则表达式。希望我的正则表达式在所有数据上都能正常运行。：）

import re

def parse_numstr(s):
    ''' Convert a numeric string to a number. 
    Return an integer if the string is a valid representation of an integer,
    Otherwise return a float, if its's a valid rep of a float,
    Otherwise, return the original string '''
    try:
        return int(s)
    except ValueError:

        try:
            return float(s)
        except ValueError:
            return s

pat = re.compile(r'([A-Z_]+)([-+]?[0-9.]+(?:e[-+]?[0-9]+)?)', re.I)

def extract(s):
    return dict((k, parse_numstr(v)) for k,v in pat.findall(s))

data = [
    'Height_A_B132width_top100.0lengthsimple0.00001',
    'Height_A132width_top100lengthsimple1.34e+003',
    'test_c4.2E1p-3q+5z123E-2e2.71828',
]

for s in data:
    print(extract(s))

<强>输出

{'Height_A_B': 132, 'width_top': 100.0, 'lengthsimple': 1.0000000000000001e-05}
{'width_top': 100, 'Height_A': 132, 'lengthsimple': 1340.0}
{'q': 5, 'p': -3, 'z': 1.23, 'test_c': 42.0, 'e': 2.71828}

请注意，我的正则表达式会接受包含多个小数点的科学记数法中的格式错误的数字，parse_numstr将只返回字符串。如果您的数据不包含此类格式错误的数字，那应该不会成为问题。

这是一个稍好的正则表达式。它只允许一个小数点，但也会接受在小数点两边没有数字的格式错误的数字，如.或.E1等。

pat = re.compile(r'([A-Z_]+)([-+]?[0-9]*\.?[0-9]*(?:e[-+]?[0-9]+)?)', re.I)

另请参阅this answer以获取以科学记数法表示数字的正则表达式。

Answer 3

你走了：

import re
p = re.compile(ur'([a-zA-z]+)([0-9.]+)')
test_str = u"Height_A_B132width_top100.0lengthsimple0.00001"

print dict(re.findall(p, test_str))

Answer 4

这个简单的正则表达式将起作用：

[0-9.+e]+|\D+

创建你的词组：

def pairs(s):
    mtch = re.finditer("[0-9.+e]+|\D+", s)
    m1, m2 = next(mtch, ""), next(mtch, "")
    while m1:
        yield m1.group(), float(m2.group())
        m1, m2 = next(mtch, ""), next(mtch, "")

演示：

In [27]: s =  'Height_A_B132width_top100.0lengthsimple0.00001'

In [28]: print(dict(pairs(s)))
{'Height_A_B': 132.0, 'width_top': 100.0, 'lengthsimple': 1e-05}

In [29]: s = 'Height_A132width_top100.0lengthsimple1.34e+003'

In [30]: print(dict(pairs(s)))
{'width_top': 100.0, 'Height_A': 132.0, 'lengthsimple': 1340.0}

或者对于更通用的方法，您可以使用ast.literal_eval来解析值以适用于多种类型：

from ast import literal_eval
def pairs(s):
    mtch = re.finditer("[0-9.+e]+|\D+", s)
    m1, m2 = next(mtch, ""), next(mtch, "")
    while m1:
        yield m1.group(), literal_eval(m2.group())
        m1, m2 = next(mtch, ""), next(mtch, "")

如果您真的关注整数和浮点数，那该怎么办？

In [31]: s = 'Height_A132width_top100.0lengthsimple1.34e+99'

In [32]: dict(pairs(s))
Out[32]: {'Height_A': 132, 'lengthsimple': 1.34e+99, 'width_top': 100.0}

变量名称和值字符串到字典

4 个答案: