通过堆叠列重塑pandas DataFrame

时间:2017-06-29 21:23:17

标签: python pandas reshape

如何使用熊猫制作这样的东西?

in:
data = {post1: [like1, like2], 
        post2: [like1, like2, like3, like4], 
        post3: [like1, like2, like3]
        }

out:
post1 like1
post1 like2
post2 like1
post2 like2
post2 like3
post2 like4
post3 like1
post3 like2
post3 like3

我已尝试过此代码,但由于列表长度不同而失败。我可以通过制作大量的DataFrame并附加它们来实现它,但它非常慢。

def run():
    result = {}

    for link in links:
        result[link] = id2screen(get_likes(link))

    df = DataFrame.from_dict(result)
    stacked = df.set_index(keys).stack()

    stacked.to_excel(r'C:\Users\user\Desktop\out.xlsx',  
                     index=False)

run()

1 个答案:

答案 0 :(得分:0)

带有from_dict

orient='index'更能容忍不同长度的数据:

pd.DataFrame.from_dict(data, orient='index')
Out[32]: 
           0      1      2      3
post1  like1  like2   None   None
post3  like1  like2  like3   None
post2  like1  like2  like3  like4

然而,

pd.DataFrame.from_dict(data, orient='index').stack()

给出:

Out[40]: 
post1  0    like1
       1    like2
post3  0    like1
       1    like2
       2    like3
post2  0    like1
       1    like2
       2    like3
       3    like4
dtype: object

因此,为了获得图示的目标输出,您可以添加.reset_index(level=1, drop=True)

pd.DataFrame.from_dict(data, orient='index').stack().reset_index(level=1, 
                                                                 drop=True)
Out[34]: 
post1    like1
post1    like2
post3    like1
post3    like2
post3    like3
post2    like1
post2    like2
post2    like3
post2    like4
dtype: object