如何将Unicode dict转换为dict

时间:2013-06-17 12:50:38

标签: python

我想转换:

datalist = [u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg'}",
 u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/3/_/3_13.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/3/_/3_13.jpg'}",
 u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/5/_/5_3_1.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/5/_/5_3_1.jpg'}",
 u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/1/_/1_22.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_22.jpg'}",
 u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/4/_/4_7_1.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/4/_/4_7_1.jpg'}"]

列出包含python dict。如果我尝试使用关键字提取值我得到了这个错误:

for i in datalist:
    print i['smallimage']
   ....:     

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-686ea4feba66> in <module>()
      1 for i in datalist:
----> 2     print i['smallimage']
      3 

TypeError: string indices must be integers

如何将包含Unicode Dict的列表转换为Dict ..

6 个答案:

答案 0 :(得分:8)

您可以使用demjson模块,该模块具有处理您所拥有数据的非严格模式:

import demjson

for data in datalist:
    dct = demjson.decode(data)
    print dct['gallery'] # etc...

答案 1 :(得分:3)

在这种情况下,我会手工制作一个正则表达式,使它们成为可以用Python评估的东西:

import re
import ast
from functools import partial

keys = re.compile(r'(gallery|smallimage|largeimage)')
fix_keys = partial(keys.sub, r'"\1"')

for entry in datalist:
    entry = ast.literal_eval(fix_keys(entry))

是的,这是有限的;但它适用于集,并且只要密钥匹配就很健壮。正则表达式是 simple 来维护。此外,这不使用任何外部依赖,它都是基于已经包含的电池。

结果:

>>> for entry in datalist:
...     print ast.literal_eval(fix_keys(entry))
... 
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/3/_/3_13.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/3/_/3_13.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/5/_/5_3_1.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/5/_/5_3_1.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_22.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/1/_/1_22.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/4/_/4_7_1.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/4/_/4_7_1.jpg'}

答案 2 :(得分:3)

正如另一个想法,你的列表格式正确Yaml。

> yaml.load(u'{foo: "bar"}')['foo']
'bar'

如果你想真正想要并一次解析所有内容:

> data = yaml.load('['+','.join(datalist)+']')
> data[0]['smallimage']
'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'
> data[3]['gallery']
'gal1'

答案 3 :(得分:2)

如果您的字典键被引用,您可以 使用json.loads加载字符串。

import json
for i in datalist:
   print json.loads(i)['smallimage']

ast.literal_eval也会奏效......)

然而,实际上,这适用于旧学校eval

>>> class Mdict(dict):
...     def __missing__(self,key):
...        return key
... 
>>> eval(datalist[0],Mdict(__builtins__=None))
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'}

请注意,这可能容易受到注入攻击,因此只有在字符串来自可靠来源时才使用它。


最后,对于任何想要一个简短的,虽然有点密集的解决方案,只使用标准库并且不易受到注入攻击的人...这个小宝石可以解决这个问题(假设字典键是有效的标识符)!

import ast
class RewriteName(ast.NodeTransformer):
    def visit_Name(self,node):
        return ast.Str(s=node.id)

transformer = RewriteName()
for x in datalist:
    tree = ast.parse(x,mode='eval')
    transformer.visit(tree)
    print ast.literal_eval(tree)['smallimage']

答案 4 :(得分:0)

您的datalist是list unicode字符串。

您可以使用eval,但您的密钥未正确引用。您可以做的是使用replace动态重新引用您的密钥:

for i in datalist:
    my_dict = eval(i.replace("gallery", "'gallery'").replace("smallimage", "'smallimage'").replace("largeimage", "'largeimage'"))
    print my_dict["smallimage"]

答案 5 :(得分:-4)

我不明白为什么需要所有额外的东西,例如使用rejson ......

fdict = {str(k): v for (k, v) in udict.items()}

udict是具有dict个密钥的unicode。只需将它们转换为str即可。在您的给定数据中,您可以简单地......

datalist = [dict((str(k), v) for (k, v) in i.items()) for i in datalist]

简单测试:

>>> datalist = [{u'a':1,u'b':2},{u'a':1,u'b':2}]
[{u'a': 1, u'b': 2}, {u'a': 1, u'b': 2}]
>>> datalist = [dict((str(k), v) for (k, v) in i.items()) for i in datalist]
>>> datalist
[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]

import reimport json。简单快捷。