从带有字典的字符串中检索字典列表

时间:2018-09-04 17:53:45

标签: python dictionary

我有多个不以逗号分隔的字典,并且类型是字符串,是否可以将它们分开并在列表中获得漂亮的元素,每个元素代表一个字典。

例如:我所拥有的:{} {} {}

我想要的是[{},{},{}]

我知道它类似于Want to separate list of dictionaries not separated by comma,但是我不想调用子进程并调用sed。

示例:

data  = {"key1":"val1", "key2":"val2", "key3":"val3", "key4":"val4"} {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"} {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"}

what i want is  :

[{"key1":"val1", "key2":"val2", "key3":"val3", "key4":"val4"} , 
 {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"}, 
 {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"}]

我如何实现这一目标。

示例2:

string = '''{"Date1":"2017-02-13T00:00:00.000Z","peerval":"222.22000","PID":109897,"Title":"Prop 1","Temp":5,"Temp Actual":5,"Temp Predicted":3.9,"Level":"Medium","Explaination":"Source: {some data \n  some link http:\\www.ggogle\.com with some sepcial characters ">< ?? // {} [] ;;}","creator":"\\etc\\someid","createdtime" :"2017-02-12T15:24:38.380Z"}
{"Date1":"2017-02-13T00:00:00.000Z","peerval":"222.22000","PID":109890,"Title":"Prop 2","Temp":5,"Temp Actual":5,"Temp Predicted":3.9,"Level":"Medium","Explaination":"Source: {some data \n  some link http:\\www.ggogle\.com with some sepcial characters ">< ?? // {} [] ;;}","creator":"\\etc\\someid","createdtime" :"2017-02-12T15:24:38.380Z"}

'''

注意:每个字典都以$(换行符)

结尾

4 个答案:

答案 0 :(得分:4)

这种方法有点慢(相对于字符串长度,大约为O(N ^ 2)),但是它可以处理非常复杂的文字语法,包括嵌套的数据结构。在ast.literal_eval的依次较小的切片中循环调用s,直到找到语法上有效的切片。然后删除该片并继续直到字符串为空。

import ast

def parse_consecutive_literals(s):
    result = []
    while s:
        for i in range(len(s), 0, -1):
            #print(i, repr(s), repr(s[:i]), len(result))
            try:
                obj = ast.literal_eval(s[:i])
            except SyntaxError:
                continue
            else:
                result.append(obj)
                s = s[i:].strip()
                break
        else:
            raise Exception("Couldn't parse remainder of string: " + repr(s))
    return result

test_cases = [
    "{} {} {}",
    "{}{}{}",
    "{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}",
    "[11] 'twelve' 13 14.0",
    "{} {\"'hi '}'there\"} {'whats \"}\"{\"up'}",
    "{1: 'foo\\'}bar'}"
]

for s in test_cases:
    print("{} parses into {}".format(repr(s), parse_consecutive_literals(s)))

结果:

'{} {} {}' parses into [{}, {}, {}]
'{}{}{}' parses into [{}, {}, {}]
"{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}" parses into [{1: 2, 3: 4}, {5: 6, '7': [8, {9: 10}]}]
"[11] 'twelve' 13 14.0" parses into [[11], 'twelve', 13, 14.0]
'{} {"\'hi \'}\'there"} {\'whats "}"{"up\'}' parses into [{}, {"'hi '}'there"}, {'whats "}"{"up'}]
"{1: 'foo\\'}bar'}" parses into [{1: "foo'}bar"}]

但是,我并不热衷于将此解决方案用于生产质量代码。首先,以更合理的格式序列化数据会更好,例如json。

答案 1 :(得分:3)

对于书名空间,运行时为 O(n),使用Python库针对字符串连接进行了优化,并且没有开销:

.+

输出

.*

我也做了一些时间安排

def fetch_until(sep, char_iter):                                                                                                                                         
    chars = []
    escapes = 0
    while True:
        try:
            c = next(char_iter)
        except StopIteration:
            break
        if c == "\\":
            escapes += 1
        chars.append(c)
        if c == sep:
            if escapes % 2 == 0:
                break
        if c != "\\":
            escapes = 0
    return chars

def fix(data):
    brace_level = 0
    result = []
    char_iter = iter(data)

    try:
        while True:
            c = next(char_iter)
            result.append(c)
            if c in ("'", '"'):
                result.extend(fetch_until(c, char_iter))
            elif c == "{":
                brace_level += 1
            elif c == "}":
                brace_level -= 1
                if brace_level == 0:
                    result.append(",")
    except StopIteration:
        pass

    return eval("[{}]".format("".join(result[:-1])))

test_cases = [
   "{1: 'foo\\'}bar'}",
    "{} {\"'hi '}'there\"} {'whats \"}\"{\"up'}",
    "{}{}{}",
    "{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}",
    "{1: {}} {2:3, 4:{}} {(1,)}",
    "{1: 'foo'} {'bar'}",
]

for test_case in test_cases:
    print("{!r:40s} -> {!r}".format(test_case, fix(test_case)))

打印(在我的慢速Macbook上):

"{1: 'foo\\'}bar'}"                      -> [{1: "foo'}bar"}]
'{} {"\'hi \'}\'there"} {\'whats "}"{"up\'}' -> [{}, {"'hi '}'there"}, {'whats "}"{"up'}]
'{}{}{}'                                 -> [{}, {}, {}]
"{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}"   -> [{1: 2, 3: 4}, {5: 6, '7': [8, {9: 10}]}]
'{1: {}} {2:3, 4:{}} {(1,)}'             -> [{1: {}}, {2: 3, 4: {}}, {(1,)}]
"{1: 'foo'} {'bar'}"                     -> [{1: 'foo'}, {'bar'}]

答案 2 :(得分:-1)

您可以将其转换为有效的json字符串,然后很容易做到这一点。

import json
mydict_string = mydict_string.replace(' {', ',{')
mylist = json.loads(mydict_string)

否则,尽管我不推荐,但您也可以使用eval。

mylist = map(eval, mydict_string.split(' '))

即使内部字典不为空,这也将起作用。

答案 3 :(得分:-1)

假设dict_string是您的输入字符串,则可以尝试

import json
my_dicts = [json.loads(i) for i in dict_string.replace(", ",",").split()]