Question

给出一个字符串：

LexicalReordering0 = -1.88359 0 -1.6864 -2.34184 -3.29584 0 Distortion0 = -4 LM0 = -85.3898 WordPenalty0 = -13 PhrasePenalty0 = 11 TranslationModel0 = -6.79761 -3.06898 -8.90342 -4.35544

它包含以=结尾的所需字典的键，直到下一个键，由空格分隔的其余值是当前键的值。

请注意在解析输入字符串之前不知道键的名称

生成的字典应如下所示：

{＆＃39; PhrasePenalty0 =＆＃39;：[11.0]，＆＃39; Distortion0 =＆＃39;：[ - 4.0]，＆＃39; TranslationModel0 =＆＃39;：[ - 6.79761，-3.06898，-8.90342，-4.35544]，＆＃39; LM0 =＆＃39;：[ - 85.3898]，＆＃39; WordPenalty0 =＆＃39;：[ - 13.0]，＆＃39; LexicalReordering0 =＆＃39;： [-1.88359,0.0，-1.6864，-2.34184，-3.29584,0.0]}

我可以用这个循环来做到这一点：

>>> textin ="LexicalReordering0= -1.88359 0 -1.6864 -2.34184 -3.29584 0 Distortion0= -4 LM0= -85.3898 WordPenalty0= -13 PhrasePenalty0= 11 TranslationModel0= -6.79761 -3.06898 -8.90342 -4.35544"
>>> thiskey = ""
>>> thismap = {}
>>> for element in textin.split():
...     if element[-1] == '=':
...             thiskey = element
...             thismap[thiskey] = []
...     else:
...             thismap[thiskey].append(float(element))
... 
>>> map
{'PhrasePenalty0=': [11.0], 'Distortion0=': [-4.0], 'TranslationModel0=': [-6.79761, -3.06898, -8.90342, -4.35544], 'LM0=': [-85.3898], 'WordPenalty0=': [-13.0], 'LexicalReordering0=': [-1.88359, 0.0, -1.6864, -2.34184, -3.29584, 0.0]}

但是是否有另一种方法可以从输入字符串中实现相同的字典？（可能是正则表达式或某些pythonic解析器库？）。

Answer 1

这是使用正则表达式库执行此操作的方法。我不知道它是否更有效，或者即使它可以被描述为pythonic：

pat = re.compile(r'''([^\s=]+)=\s*((?:[^\s=]+(?:\s|$))*)''')

# The values are lists of strings
entries = dict((k, v.split()) for k, v in pat.findall(textin))

# Alternative if you want the values to be floating point numbers
entries = dict((k, list(map(float, v.split())))
               for k, v in pat.findall(textin))

在Python 2.x中，您可以使用map(float, v.split())代替list(map(float, v.split)))。

与原始程序不同，此程序允许在=和第一个值之间没有空格的情况下输入。此外，静默忽略第一个key=实例之前的输入中的任何项。明确识别它们并抛出错误可能更好。

模式说明：

([^\s=]+)                            A key (any non-whitespace except =)
         =\s*                        followed by = and possible whitespace
             ((?:[^\s=]+(?:\s|$))*)  Any number of repetitions of a string
                                     without = followed by either whitespace
                                     or the end of the input

Answer 2

由于您的输入字符串由空格分隔，并且您有键或值，您可以使用split（）然后遍历元素并分配它们。

entries = textin.split()
key = ""
for x in entries:
    try:
        x = float(x)
        answer[key].append(x)
    except ValueError:
        key = x[:-1] # ignore last char '='
        answer[key] = []

我假设你的字符串的第一个元素永远是一个键，所以当answer[key]是一个空字符串时，永远不会调用key。

如何解析一个由空格分隔的键值对的字符串？

2 个答案: