在Python中将嵌套列表放入数据框

时间:2019-07-20 11:21:59

标签: python python-3.x nested-lists

在将嵌套列表转换为Pandas DataFrame时,我还遇到了其他一些类似的问题,但是我似乎更加复杂。我目前有一个列表,里面有很多嵌套(我说的对吗?大声笑)。

例如:

 1 9 2 3 4 5 6 7 8 
     ^           ^---- index_to_swap
     ^------index_to_be_swapped

*请注意,“ total_social_media_impressions”是位于其之前的嵌套列表“ social_media_impressions”的总数。这非常棘手。

...等等。列的数量比我提到的要多,但是我只是想展示一个简短的示例。

有人知道如何将这种长嵌套列表转换为熊猫数据框吗?

更新: 我使用了for循环来标识列表中的哪些列是嵌套的:

    [{'date': 'yyyy-mm-dd',
    'total_comments':1,
    'id': 123456,
    'engagements_by_type': {'url clicks': 111, 'other clicks':222},
    'url': 'https://hi.com/stackoverflow/is/the/best',
    'posts_by_paid_unpaid': {'paid': 1, 'total': 100, 'unpaid': 99}
    'organic_impressions': ,
    'social_media_impressions': {'facebook': 2, 'twitter': 4, 'instagram': 4, 'twitch': 6,
    'total_social_media-impressions' : 10}
    {'date':....
    ......}]

下一步,是弄清楚如何正确地将它们作为普通列放入DataFrame中而不进行嵌套。

1 个答案:

答案 0 :(得分:1)

假设您希望将嵌套的dict键也转换为列,我将发布解决方案。

import pandas as pd

data = [
            {'date': 'yyyy-mm-dd',
            'total_comments':1,
            'id': 123456,
            'engagements_by_type': {'url clicks': 111, 'other clicks':222},
            'url': 'https://hi.com/stackoverflow/is/the/best',
            'posts_by_paid_unpaid': {'paid': 1, 'total': 100, 'unpaid': 99},
            'organic_impressions': 1,
            'social_media_impressions': {'facebook': 2, 'twitter': 4, 'instagram': 4, 'twitch': 6}},
            {'date': 'yyyy-mm-dd',
            'total_comments':1,
            'id': 123456,
            'engagements_by_type': {'url clicks': 111, 'other clicks':222},
            'url': 'https://hi.com/stackoverflow/is/the/best',
            'posts_by_paid_unpaid': {'paid': 1, 'total': 100, 'unpaid': 99},
            'organic_impressions': 1,
            'social_media_impressions': {'facebook': 2, 'twitter': 4, 'instagram': 4, 'twitch': 6}},
            {'date': 'yyyy-mm-dd',
            'total_comments':1,
            'id': 123456,
            'engagements_by_type': {'url clicks': 111, 'other clicks':222},
            'url': 'https://hi.com/stackoverflow/is/the/best',
            'posts_by_paid_unpaid': {'paid': 1, 'total': 100, 'unpaid': 99},
            'organic_impressions': 1,
            'social_media_impressions': {'facebook': 2, 'twitter': 4, 'instagram': 4, 'twitch': 6}}
    ] 

def create_plain_dict(ip):
    for i in list(ip):
        if type(ip[i]) == dict: #check whether value associated with that key is dict and if yes then update it with original dict and pop that key
            temp = ip.pop(i) #in this way we are basically converting nested dict into plain dict 
            ip.update(temp)
    return ip

mod_data = list(map(create_plain_dict, data))

df = pd.DataFrame(data)

数据框看起来像这样

enter image description here