Question

我有一只大熊猫DataFrame，如下所示：

     bus_uid   bus_type    type                      obj_uid  \
0     biomass: DEB31    biomass  output       Simple_139804698384200   
0     biomass: DEB31    biomass   other                        duals   
0     biomass: DEB31    biomass   other                       excess   

                                         datetime  \
0   DatetimeIndex(['2015-01-01 00:00:00',  '2015-01-01 01:00:00',  '2015-01-01 02:00:00', ...   
0   DatetimeIndex(['2015-01-01 00:00:00',  '2015-01-01 01:00:00',  '2015-01-01 02:00:00', ...   
0   DatetimeIndex(['2015-01-01 00:00:00',  '2015-01-01 01:00:00',  '2015-01-01 02:00:00', ...   

                                           values  
0   [1.0, 2.0, 3.0, ...  
0   [4.0, 5.0, 6.0, ...  
0   [7.0, 8.0, 9.0, ...

并希望将其转换为以下格式：

     bus_uid   bus_type    type                          obj_uid  datetime             values
0     biomass: DEB31    biomass  output   Simple_139804698384200  2015-01-01 00:00:00  1.0
0     biomass: DEB31    biomass  output   Simple_139804698384200  2015-01-01 01:00:00  2.0
0     biomass: DEB31    biomass  output   Simple_139804698384200  2015-01-01 02:00:00  3.0
0     biomass: DEB31    biomass   other                    duals  2015-01-01 00:00:00  4.0
0     biomass: DEB31    biomass   other                    duals  2015-01-01 01:00:00  5.0
0     biomass: DEB31    biomass   other                    duals  2015-01-01 02:00:00  6.0
0     biomass: DEB31    biomass   other                   excess  2015-01-01 00:00:00  7.0
0     biomass: DEB31    biomass   other                   excess  2015-01-01 01:00:00  8.0
0     biomass: DEB31    biomass   other                   excess  2015-01-01 02:00:00  9.0

列datetime和values具有相同的维度。

我已经问了一个类似的问题here但是无法通过两列来解决我的问题的解决方案。

将DataFrame转换为所需格式的最佳方式是什么？

Answer 1

您可以从列values和datetime新Series中提取，然后通过concat将其与原始数据框df合并：

s1 = df['values'].apply(pd.Series, 1).stack()
s1.index = s1.index.droplevel(-1) # to line up with df's index
s1.name = 'values' # needs a name to join

s2 = df['datetime'].apply(pd.Series, 1).stack()
s2.index = s2.index.droplevel(-1) # to line up with df's index
s2.name = 'datetime' # needs a name to join

#remove duplicity columns
df = df.drop( ['values', 'datetime'], axis=1)

#concat all together
df= pd.concat([df,s1,s2], axis=1).reset_index(drop=True)

print df

   bus_uid        bus_type            type                 obj_uid values  \
0        0  biomass: DEB31  biomass output  Simple_139804698384200    1.0   
1        0  biomass: DEB31  biomass output  Simple_139804698384200    2.0   
2        0  biomass: DEB31  biomass output  Simple_139804698384200    3.0   
3        0  biomass: DEB31   biomass other                   duals    4.0   
4        0  biomass: DEB31   biomass other                   duals    5.0   
5        0  biomass: DEB31   biomass other                   duals    6.0   
6        0  biomass: DEB31   biomass other                  excess    7.0   
7        0  biomass: DEB31   biomass other                  excess    8.0   
8        0  biomass: DEB31   biomass other                  excess    9.0   

             datetime  
0 2015-01-01 00:00:00  
1 2015-01-01 01:00:00  
2 2015-01-01 02:00:00  
3 2015-01-01 00:00:00  
4 2015-01-01 01:00:00  
5 2015-01-01 02:00:00  
6 2015-01-01 00:00:00  
7 2015-01-01 01:00:00  
8 2015-01-01 02:00:00

Answer 2

您可以遍历行以从单元格中提取Index和Series信息。当您需要同时提取信息时，我认为reshaping方法效果不佳：

示例数据：

rows = 3
df = pd.DataFrame(data={'bus_uid': list(repeat('biomass: DEB31', rows)), 'type': list(repeat('biomass', 3)), 'id': ['id1', 'id2', 'id3'], 'datetime': list(repeat(pd.DatetimeIndex(start=datetime(2016,1,1), periods=3, freq='D'), rows)), 'values': list(repeat([1,2,3], rows))})

          bus_uid                                           datetime   id  \
0  biomass: DEB31  DatetimeIndex(['2016-01-01', '2016-01-02', '20...  id1   
1  biomass: DEB31  DatetimeIndex(['2016-01-01', '2016-01-02', '20...  id2   
2  biomass: DEB31  DatetimeIndex(['2016-01-01', '2016-01-02', '20...  id3   

      type     values  
0  biomass  [1, 2, 3]  
1  biomass  [1, 2, 3]  
2  biomass  [1, 2, 3]

在您遍历DataFrame DataFrame时构建新的rows：

new_df = pd.DataFrame()
for index, cols in df.iterrows():
    extract_df = pd.DataFrame.from_dict({'datetime': cols.ix['datetime'], 'values': cols.ix['values']})
    extract_df = pd.concat([extract_df, cols.drop(['datetime', 'values']).to_frame().T], axis=1).fillna(method='ffill').fillna(method='bfill')
    new_df = pd.concat([new_df, extract_df], ignore_index=True)

得到：

    datetime  values         bus_uid   id     type
0 2016-01-01       1  biomass: DEB31  id1  biomass
1 2016-01-02       2  biomass: DEB31  id1  biomass
2 2016-01-03       3  biomass: DEB31  id1  biomass
3 2016-01-01       1  biomass: DEB31  id2  biomass
4 2016-01-02       2  biomass: DEB31  id2  biomass
5 2016-01-03       3  biomass: DEB31  id2  biomass
6 2016-01-01       1  biomass: DEB31  id3  biomass
7 2016-01-02       2  biomass: DEB31  id3  biomass
8 2016-01-03       3  biomass: DEB31  id3  biomass

熊猫：如何将列中的多个列表拆分成多行？

2 个答案: