假设我有一个类似以下的数据框:
date A B C D
2014-03-18 1.223777 0.356887 1.201624 1.968612
2014-03-18 0.160730 1.888415 0.306334 0.203939
2014-03-18 -0.203101 -0.161298 2.426540 0.056791
2014-03-18 -1.350102 0.990093 0.495406 0.036215
2014-03-18 -1.862960 2.673009 -0.545336 -0.925385
2014-03-19 0.238281 0.468102 -0.150869 0.955069
2014-03-20 1.575317 0.811892 0.198165 1.117805
2014-03-20 0.822698 -0.398840 -1.277511 0.811691
2014-03-20 2.143201 -0.827853 -0.989221 1.088297
2014-03-20 0.299331 1.144311 -0.387854 0.209612
2014-03-20 1.284111 -0.470287 -0.172949 -0.792020
2014-03-22 1.031994 1.059394 0.037627 0.101246
2014-03-22 0.889149 0.724618 0.459405 1.023127
2014-03-23 -1.136320 -0.396265 -1.833737 1.478656
2014-03-23 -0.740400 -0.644395 -1.221330 0.321805
2014-03-23 -0.443021 -0.172013 0.020392 -2.368532
我想将其序列化,以便最终得到:
date value unit condition
2014-03-18 1.223777 1 A
2014-03-18 0.160730 1 A
... ... ... ...
2014-03-19 0.238281 2 A
2014-03-20 1.575317 3 A
... ... ... ...
2014-03-18 0.468102 1 B
... ... ... ...
其中:
date
是原始date
列condition
包含原始数据框中数据列的名称unit
在相应日期内拥有唯一ID value
包含相应列的值我怎么能在熊猫中做到这一点?
背景:如果想用needed绘制多个时间序列,显然这是seaborn。有关详细信息,请参阅此other post。
答案 0 :(得分:3)
您可以使用pandas.melt
执行此操作,然后通过Series
将唯一日期映射到整数。
In [31]: df
Out[31]:
date A B C D
0 2014-03-18 1.2238 0.3569 1.2016 1.9686
1 2014-03-18 0.1607 1.8884 0.3063 0.2039
2 2014-03-18 -0.2031 -0.1613 2.4265 0.0568
3 2014-03-18 -1.3501 0.9901 0.4954 0.0362
4 2014-03-18 -1.8630 2.6730 -0.5453 -0.9254
5 2014-03-19 0.2383 0.4681 -0.1509 0.9551
6 2014-03-20 1.5753 0.8119 0.1982 1.1178
7 2014-03-20 0.8227 -0.3988 -1.2775 0.8117
8 2014-03-20 2.1432 -0.8279 -0.9892 1.0883
9 2014-03-20 0.2993 1.1443 -0.3879 0.2096
10 2014-03-20 1.2841 -0.4703 -0.1729 -0.7920
11 2014-03-22 1.0320 1.0594 0.0376 0.1012
12 2014-03-22 0.8891 0.7246 0.4594 1.0231
13 2014-03-23 -1.1363 -0.3963 -1.8337 1.4787
14 2014-03-23 -0.7404 -0.6444 -1.2213 0.3218
15 2014-03-23 -0.4430 -0.1720 0.0204 -2.3685
[16 rows x 5 columns]
In [32]: molten = pd.melt(df, id_vars=['date'], var_name='condition')
In [33]: molten
Out[33]:
date condition value
0 2014-03-18 A 1.2238
1 2014-03-18 A 0.1607
2 2014-03-18 A -0.2031
3 2014-03-18 A -1.3501
4 2014-03-18 A -1.8630
5 2014-03-19 A 0.2383
6 2014-03-20 A 1.5753
7 2014-03-20 A 0.8227
8 2014-03-20 A 2.1432
9 2014-03-20 A 0.2993
10 2014-03-20 A 1.2841
11 2014-03-22 A 1.0320
12 2014-03-22 A 0.8891
13 2014-03-23 A -1.1363
14 2014-03-23 A -0.7404
15 2014-03-23 A -0.4430
16 2014-03-18 B 0.3569
17 2014-03-18 B 1.8884
18 2014-03-18 B -0.1613
19 2014-03-18 B 0.9901
20 2014-03-18 B 2.6730
21 2014-03-19 B 0.4681
22 2014-03-20 B 0.8119
23 2014-03-20 B -0.3988
24 2014-03-20 B -0.8279
... ... ...
[64 rows x 3 columns]
In [35]: dates = molten.date.unique()
In [36]: mapper = Series(arange(dates.size), index=dates)
In [38]: molten['unit'] = mapper[molten.date].values
In [39]: molten
Out[39]:
date condition value unit
0 2014-03-18 A 1.2238 0
1 2014-03-18 A 0.1607 0
2 2014-03-18 A -0.2031 0
3 2014-03-18 A -1.3501 0
4 2014-03-18 A -1.8630 0
5 2014-03-19 A 0.2383 1
6 2014-03-20 A 1.5753 2
7 2014-03-20 A 0.8227 2
8 2014-03-20 A 2.1432 2
9 2014-03-20 A 0.2993 2
10 2014-03-20 A 1.2841 2
11 2014-03-22 A 1.0320 3
12 2014-03-22 A 0.8891 3
13 2014-03-23 A -1.1363 4
14 2014-03-23 A -0.7404 4
15 2014-03-23 A -0.4430 4
16 2014-03-18 B 0.3569 0
17 2014-03-18 B 1.8884 0
18 2014-03-18 B -0.1613 0
19 2014-03-18 B 0.9901 0
20 2014-03-18 B 2.6730 0
21 2014-03-19 B 0.4681 1
22 2014-03-20 B 0.8119 2
23 2014-03-20 B -0.3988 2
24 2014-03-20 B -0.8279 2
... ... ... ...
[64 rows x 4 columns]