Question

我有下面的字典列表。

content = ['{"a": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E; 360SE)", "c": "US", "nk": 0, "tz": "America/Los_Angeles", "g": "1lj67KQ", "h": "1xupVE6", "mc": 807, "u": "https://cdn.adf.ly/js/display.js", "t": 1427288399, "cy": "Mountain View"}\n',
 '{"a": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E; 360SE)", "c": "US", "nk": 0, "tz": "America/New_York", "g": "1lj67KQ", "h": "1xupVE6", "mc": 514, "u": "https://cdn.adf.ly/js/display.js", "t": 1427288399, "cy": "Buffalo"}\n']

当我尝试将字典列表转换为数据框或使用行中的键和值创建列时，我收到“TypeError: string indices must be integers'错误消息。

方法：1

for x in content:

     print (x["a"], x["nk"])

方法：2

result = []

sumlist = ["a", "nk"]
for d in content:

      result.append({"col1": d["a"],
                   "col2": d['nk']})

print (result)

Answer 1

选项1
它实际上是 JSON，你可以使用 json_normalize + json.loads 。

df = pd.io.json.json_normalize([json.loads(x) for x in content])
print(df) 
                                                   a   c             cy  \
0  Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...  US  Mountain View   
1  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ...  US        Buffalo   

         g        h   mc  nk           t                   tz  \
0  1lj67KQ  1xupVE6  807   0  1427288399  America/Los_Angeles   
1  1lj67KQ  1xupVE6  514   0  1427288399     America/New_York   

                                  u  
0  https://cdn.adf.ly/js/display.js  
1  https://cdn.adf.ly/js/display.js

如果你想要的只有a和nk，请使用：

df = pd.DataFrame.from_dict(content)[['a', 'nk']]

选项2
ast.literal_eval

import ast

content = [ast.literal_eval(x) for x in content]
df = pd.DataFrame.from_dict(content)

print(df)                                                      
                                                   a   c             cy  \
0  Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...  US  Mountain View   
1  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ...  US        Buffalo   

         g        h   mc  nk           t                   tz  \
0  1lj67KQ  1xupVE6  807   0  1427288399  America/Los_Angeles   
1  1lj67KQ  1xupVE6  514   0  1427288399     America/New_York   

                                  u  
0  https://cdn.adf.ly/js/display.js  
1  https://cdn.adf.ly/js/display.js

python中字典到数据框列表中的错误

1 个答案: