Question

我需要创建并保存带有分层索引的Pandas数据帧。在下文中，我创建了两个数据帧，然后将它们连接起来以创建具有分层索引的新数据帧。

data1 = np.random.rand(5,5)
data2 = np.random.rand(5,5)
df1 = pd.DataFrame(data1, columns = ['a', 'b', 'c', 'd', 'e'],  index=['i1', 'i2', 'i3', 'i4', 'i5'])
df2 = pd.DataFrame(data2, columns = ['a', 'b', 'c', 'd', 'e'],  index=['i1', 'i2', 'i3', 'i4', 'i5'])

df = pd.concat([df1, df2], keys=['first', 'second'])

print "Original Data frame"
print df

# Save to file.
df.to_csv('test')

# Read from file.
df_new = pd.DataFrame.from_csv('test')

print "Saved Data frame"
print df_new

这是我得到的输出，

Original Data frame
                  a         b         c         d         e
first  i1  0.926553  0.180306  0.182887  0.783061  0.832914
       i2  0.899054  0.130367  0.615534  0.965580  0.669495
       i3  0.931004  0.425528  0.068938  0.166522  0.714399
       i4  0.082365  0.587194  0.993864  0.187864  0.066035
       i5  0.668671  0.294744  0.136317  0.358732  0.529674
second i1  0.916310  0.361423  0.700380  0.386119  0.273667
       i2  0.102542  0.454106  0.565760  0.259323  0.104743
       i3  0.410280  0.379986  0.288921  0.177819  0.919343
       i4  0.447279  0.113711  0.032273  0.335358  0.717824
       i5  0.995781  0.356817  0.146785  0.972401  0.169360

Saved Data frame
       Unnamed: 1         a         b         c         d         e
first          i1  0.926553  0.180306  0.182887  0.783061  0.832914
first          i2  0.899054  0.130367  0.615534  0.965580  0.669495
first          i3  0.931004  0.425528  0.068938  0.166522  0.714399
first          i4  0.082365  0.587194  0.993864  0.187864  0.066035
first          i5  0.668671  0.294744  0.136317  0.358732  0.529674
second         i1  0.916310  0.361423  0.700380  0.386119  0.273667
second         i2  0.102542  0.454106  0.565760  0.259323  0.104743
second         i3  0.410280  0.379986  0.288921  0.177819  0.919343
second         i4  0.447279  0.113711  0.032273  0.335358  0.717824
second         i5  0.995781  0.356817  0.146785  0.972401  0.169360

当我将这个新数据帧保存到csv文件（'test'）并将其读回时，我松开了分层索引。有没有办法将数据保存到文件中，这样当我读回来时，我保留了层次索引？

Answer 1

以不同于使用csv的方式保存。比如泡菜：

df.to_pickle('dataframe.pickle')

这保留了分层索引。你再读一遍：

pd.read_pickle('dataframe.pickle')

Pandas有几种IO方法，你可以在documentation中阅读它们。

Answer 2

你可以：

重置索引并将DataFrame保存到csv，从csv读回，然后将索引设置回原始（inplace）。

df
Out[11]: 
                  a         b         c         d         e
first  i1  0.935478  0.455757  0.607418  0.850291  0.704326
       i2  0.675752  0.339017  0.999949  0.508480  0.888817
       i3  0.463371  0.803389  0.048469  0.599697  0.423603
       i4  0.935294  0.933699  0.843289  0.182535  0.255847
       i5  0.321236  0.120010  0.647876  0.000517  0.032592
second i1  0.172044  0.691660  0.799164  0.194785  0.302880
       i2  0.432988  0.511229  0.451268  0.203145  0.560563
       i3  0.442584  0.771483  0.839945  0.716374  0.533183
       i4  0.167898  0.962646  0.152245  0.400280  0.210355
       i5  0.736365  0.511057  0.256672  0.619250  0.790739

df.reset_index()
Out[12]: 
  level_0 level_1         a         b         c         d         e
0   first      i1  0.935478  0.455757  0.607418  0.850291  0.704326
1   first      i2  0.675752  0.339017  0.999949  0.508480  0.888817
2   first      i3  0.463371  0.803389  0.048469  0.599697  0.423603
3   first      i4  0.935294  0.933699  0.843289  0.182535  0.255847
4   first      i5  0.321236  0.120010  0.647876  0.000517  0.032592
5  second      i1  0.172044  0.691660  0.799164  0.194785  0.302880
6  second      i2  0.432988  0.511229  0.451268  0.203145  0.560563
7  second      i3  0.442584  0.771483  0.839945  0.716374  0.533183
8  second      i4  0.167898  0.962646  0.152245  0.400280  0.210355
9  second      i5  0.736365  0.511057  0.256672  0.619250  0.790739

df.reset_index().to_csv('test.csv', index=False)
df3 = pd.read_csv('test.csv')
df3.set_index(['level_0', 'level_1'], inplace=True)

>>> df3
Out[15]: 
                        a         b         c         d         e
level_0 level_1                                                  
first   i1       0.935478  0.455757  0.607418  0.850291  0.704326
        i2       0.675752  0.339017  0.999949  0.508480  0.888817
        i3       0.463371  0.803389  0.048469  0.599697  0.423603
        i4       0.935294  0.933699  0.843289  0.182535  0.255847
        i5       0.321236  0.120010  0.647876  0.000517  0.032592
second  i1       0.172044  0.691660  0.799164  0.194785  0.302880
        i2       0.432988  0.511229  0.451268  0.203145  0.560563
        i3       0.442584  0.771483  0.839945  0.716374  0.533183
        i4       0.167898  0.962646  0.152245  0.400280  0.210355
        i5       0.736365  0.511057  0.256672  0.619250  0.790739

如何使用分层索引保存和检索Pandas数据帧？

2 个答案: