我有一只大熊猫DataFrame
,如下所示:
bus_uid bus_type type obj_uid \
0 biomass: DEB31 biomass output Simple_139804698384200
0 biomass: DEB31 biomass other duals
0 biomass: DEB31 biomass other excess
datetime \
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
values
0 [1.0, 2.0, 3.0, ...
0 [4.0, 5.0, 6.0, ...
0 [7.0, 8.0, 9.0, ...
并希望将其转换为以下格式:
bus_uid bus_type type obj_uid datetime values
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 00:00:00 1.0
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 01:00:00 2.0
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 02:00:00 3.0
0 biomass: DEB31 biomass other duals 2015-01-01 00:00:00 4.0
0 biomass: DEB31 biomass other duals 2015-01-01 01:00:00 5.0
0 biomass: DEB31 biomass other duals 2015-01-01 02:00:00 6.0
0 biomass: DEB31 biomass other excess 2015-01-01 00:00:00 7.0
0 biomass: DEB31 biomass other excess 2015-01-01 01:00:00 8.0
0 biomass: DEB31 biomass other excess 2015-01-01 02:00:00 9.0
列datetime
和values
具有相同的维度。
我已经问了一个类似的问题here但是无法通过两列来解决我的问题的解决方案。
将DataFrame
转换为所需格式的最佳方式是什么?
答案 0 :(得分:2)
您可以从列values
和datetime
新Series
中提取,然后通过concat
将其与原始数据框df
合并:
s1 = df['values'].apply(pd.Series, 1).stack()
s1.index = s1.index.droplevel(-1) # to line up with df's index
s1.name = 'values' # needs a name to join
s2 = df['datetime'].apply(pd.Series, 1).stack()
s2.index = s2.index.droplevel(-1) # to line up with df's index
s2.name = 'datetime' # needs a name to join
#remove duplicity columns
df = df.drop( ['values', 'datetime'], axis=1)
#concat all together
df= pd.concat([df,s1,s2], axis=1).reset_index(drop=True)
print df
bus_uid bus_type type obj_uid values \
0 0 biomass: DEB31 biomass output Simple_139804698384200 1.0
1 0 biomass: DEB31 biomass output Simple_139804698384200 2.0
2 0 biomass: DEB31 biomass output Simple_139804698384200 3.0
3 0 biomass: DEB31 biomass other duals 4.0
4 0 biomass: DEB31 biomass other duals 5.0
5 0 biomass: DEB31 biomass other duals 6.0
6 0 biomass: DEB31 biomass other excess 7.0
7 0 biomass: DEB31 biomass other excess 8.0
8 0 biomass: DEB31 biomass other excess 9.0
datetime
0 2015-01-01 00:00:00
1 2015-01-01 01:00:00
2 2015-01-01 02:00:00
3 2015-01-01 00:00:00
4 2015-01-01 01:00:00
5 2015-01-01 02:00:00
6 2015-01-01 00:00:00
7 2015-01-01 01:00:00
8 2015-01-01 02:00:00
答案 1 :(得分:2)
您可以遍历行以从单元格中提取Index
和Series
信息。当您需要同时提取信息时,我认为reshaping
方法效果不佳:
示例数据:
rows = 3
df = pd.DataFrame(data={'bus_uid': list(repeat('biomass: DEB31', rows)), 'type': list(repeat('biomass', 3)), 'id': ['id1', 'id2', 'id3'], 'datetime': list(repeat(pd.DatetimeIndex(start=datetime(2016,1,1), periods=3, freq='D'), rows)), 'values': list(repeat([1,2,3], rows))})
bus_uid datetime id \
0 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id1
1 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id2
2 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id3
type values
0 biomass [1, 2, 3]
1 biomass [1, 2, 3]
2 biomass [1, 2, 3]
在您遍历DataFrame
DataFrame
时构建新的rows
:
new_df = pd.DataFrame()
for index, cols in df.iterrows():
extract_df = pd.DataFrame.from_dict({'datetime': cols.ix['datetime'], 'values': cols.ix['values']})
extract_df = pd.concat([extract_df, cols.drop(['datetime', 'values']).to_frame().T], axis=1).fillna(method='ffill').fillna(method='bfill')
new_df = pd.concat([new_df, extract_df], ignore_index=True)
得到:
datetime values bus_uid id type
0 2016-01-01 1 biomass: DEB31 id1 biomass
1 2016-01-02 2 biomass: DEB31 id1 biomass
2 2016-01-03 3 biomass: DEB31 id1 biomass
3 2016-01-01 1 biomass: DEB31 id2 biomass
4 2016-01-02 2 biomass: DEB31 id2 biomass
5 2016-01-03 3 biomass: DEB31 id2 biomass
6 2016-01-01 1 biomass: DEB31 id3 biomass
7 2016-01-02 2 biomass: DEB31 id3 biomass
8 2016-01-03 3 biomass: DEB31 id3 biomass