在将嵌套列表转换为Pandas DataFrame时,我还遇到了其他一些类似的问题,但是我似乎更加复杂。我目前有一个列表,里面有很多嵌套(我说的对吗?大声笑)。
例如:
1 9 2 3 4 5 6 7 8
^ ^---- index_to_swap
^------index_to_be_swapped
*请注意,“ total_social_media_impressions”是位于其之前的嵌套列表“ social_media_impressions”的总数。这非常棘手。
...等等。列的数量比我提到的要多,但是我只是想展示一个简短的示例。
有人知道如何将这种长嵌套列表转换为熊猫数据框吗?
更新: 我使用了for循环来标识列表中的哪些列是嵌套的:
[{'date': 'yyyy-mm-dd',
'total_comments':1,
'id': 123456,
'engagements_by_type': {'url clicks': 111, 'other clicks':222},
'url': 'https://hi.com/stackoverflow/is/the/best',
'posts_by_paid_unpaid': {'paid': 1, 'total': 100, 'unpaid': 99}
'organic_impressions': ,
'social_media_impressions': {'facebook': 2, 'twitter': 4, 'instagram': 4, 'twitch': 6,
'total_social_media-impressions' : 10}
{'date':....
......}]
下一步,是弄清楚如何正确地将它们作为普通列放入DataFrame中而不进行嵌套。
答案 0 :(得分:1)
假设您希望将嵌套的dict键也转换为列,我将发布解决方案。
import pandas as pd
data = [
{'date': 'yyyy-mm-dd',
'total_comments':1,
'id': 123456,
'engagements_by_type': {'url clicks': 111, 'other clicks':222},
'url': 'https://hi.com/stackoverflow/is/the/best',
'posts_by_paid_unpaid': {'paid': 1, 'total': 100, 'unpaid': 99},
'organic_impressions': 1,
'social_media_impressions': {'facebook': 2, 'twitter': 4, 'instagram': 4, 'twitch': 6}},
{'date': 'yyyy-mm-dd',
'total_comments':1,
'id': 123456,
'engagements_by_type': {'url clicks': 111, 'other clicks':222},
'url': 'https://hi.com/stackoverflow/is/the/best',
'posts_by_paid_unpaid': {'paid': 1, 'total': 100, 'unpaid': 99},
'organic_impressions': 1,
'social_media_impressions': {'facebook': 2, 'twitter': 4, 'instagram': 4, 'twitch': 6}},
{'date': 'yyyy-mm-dd',
'total_comments':1,
'id': 123456,
'engagements_by_type': {'url clicks': 111, 'other clicks':222},
'url': 'https://hi.com/stackoverflow/is/the/best',
'posts_by_paid_unpaid': {'paid': 1, 'total': 100, 'unpaid': 99},
'organic_impressions': 1,
'social_media_impressions': {'facebook': 2, 'twitter': 4, 'instagram': 4, 'twitch': 6}}
]
def create_plain_dict(ip):
for i in list(ip):
if type(ip[i]) == dict: #check whether value associated with that key is dict and if yes then update it with original dict and pop that key
temp = ip.pop(i) #in this way we are basically converting nested dict into plain dict
ip.update(temp)
return ip
mod_data = list(map(create_plain_dict, data))
df = pd.DataFrame(data)
数据框看起来像这样