Question

我遇到了一个问题，尽管有大量的谷歌搜索，但我已经有两天无法解决了。我一直在从crunchbase.com下载数据。我将原始数据存储在DataFrame中。但是，一个变量存储为字符串，它实际上应该是字典列表。

查看pandas系列的特定元素会产生一个字符串：

"[{'entity_def_id': 'category', 'permalink': 'media-and-entertainment', 'uuid': '78b58810-ad58-a623-2a80-2a0e3603a544', 'value': 'Media and Entertainment'}, {'entity_def_id': 'category', 'permalink': 'tv', 'uuid': '86d91a85-ff9d-93db-4688-3b608fee756c', 'value': 'TV'}, {'entity_def_id': 'category', 'permalink': 'tv-production', 'uuid': '47592b2e-aaaa-6aa3-d0e9-82ab5e525c2d', 'value': 'TV Production'}]"

The specific column in the DataFrame

请注意，缺少存储此字典列表str的系列中的一些观察结果（如果有的话）。

我想在我的DataFrame中创建新的列，其中列名对应于键，并且对于每个观察，该对应值来自dict；但是，我不知道该怎么做，因为它是一个字符串，我只能用整数索引，而不能直接访问字典。其实，什么

我尝试使用json.loads，这给了我TypeError：JSON对象必须是str，bytes或bytearray，而不是Series。

我还尝试了ast.literal_eval（），它为我提供了ValueError：格式错误的节点或字符串：0。

如果我的格式/样式不好，请多多指教，这是我第一次在这里发布。

Answer 1

只需使用eval()函数

import pandas as pd

s = "[{'entity_def_id': 'category', 'permalink': 'media-and-entertainment', 'uuid': '78b58810-ad58-a623-2a80-2a0e3603a544', 'value': 'Media and Entertainment'}, {'entity_def_id': 'category', 'permalink': 'tv', 'uuid': '86d91a85-ff9d-93db-4688-3b608fee756c', 'value': 'TV'}, {'entity_def_id': 'category', 'permalink': 'tv-production', 'uuid': '47592b2e-aaaa-6aa3-d0e9-82ab5e525c2d', 'value': 'TV Production'}]"

l = eval(s)

df = pd.DataFrame(l)

Out[1]: 
  entity_def_id  ...                    value
0      category  ...  Media and Entertainment
1      category  ...                       TV
2      category  ...            TV Production

[3 rows x 4 columns]

将包含DataFrame中词典列表的字符串转换为词典列表

1 个答案: