我有下面的字典列表。
content = ['{"a": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E; 360SE)", "c": "US", "nk": 0, "tz": "America/Los_Angeles", "g": "1lj67KQ", "h": "1xupVE6", "mc": 807, "u": "https://cdn.adf.ly/js/display.js", "t": 1427288399, "cy": "Mountain View"}\n',
'{"a": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E; 360SE)", "c": "US", "nk": 0, "tz": "America/New_York", "g": "1lj67KQ", "h": "1xupVE6", "mc": 514, "u": "https://cdn.adf.ly/js/display.js", "t": 1427288399, "cy": "Buffalo"}\n']
当我尝试将字典列表转换为数据框或使用行中的键和值创建列时,我收到“TypeError: string indices must be integers'
错误消息。
方法:1
for x in content:
print (x["a"], x["nk"])
方法:2
result = []
sumlist = ["a", "nk"]
for d in content:
result.append({"col1": d["a"],
"col2": d['nk']})
print (result)
答案 0 :(得分:3)
选项1
它实际上是 JSON,你可以使用 json_normalize
+ json.loads
。
df = pd.io.json.json_normalize([json.loads(x) for x in content])
print(df)
a c cy \
0 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... US Mountain View
1 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ... US Buffalo
g h mc nk t tz \
0 1lj67KQ 1xupVE6 807 0 1427288399 America/Los_Angeles
1 1lj67KQ 1xupVE6 514 0 1427288399 America/New_York
u
0 https://cdn.adf.ly/js/display.js
1 https://cdn.adf.ly/js/display.js
如果你想要的只有a
和nk
,请使用:
df = pd.DataFrame.from_dict(content)[['a', 'nk']]
选项2
ast.literal_eval
import ast
content = [ast.literal_eval(x) for x in content]
df = pd.DataFrame.from_dict(content)
print(df)
a c cy \
0 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... US Mountain View
1 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ... US Buffalo
g h mc nk t tz \
0 1lj67KQ 1xupVE6 807 0 1427288399 America/Los_Angeles
1 1lj67KQ 1xupVE6 514 0 1427288399 America/New_York
u
0 https://cdn.adf.ly/js/display.js
1 https://cdn.adf.ly/js/display.js