我想调整django查询集产生的数据,同时在索引列上保持原始(非字母)排序顺序。然后,透视数据将用于Google可视化折线图。
我已经将我自己的代码整合在一起来完成这项工作,但它有点难看,我想知道是否可以使用pandas DataFrame数据集来完成。
我之前从未使用过熊猫,在阅读完doco之后,这就是我想出来的。
这是我的不透明数据框,按日期和期限排序,其中的次序后缀代表:D =日,M =月,Y =年。
df = DataFrame(data)
date tenor value
0 2014-01-01 1D 0.517125
1 2014-01-01 1M 0.5175
2 2014-01-01 2M 0.518159
3 2014-01-01 3M 0.5187
4 2014-01-01 4M 0.51912
5 2014-01-01 5M 0.51949
6 2014-01-01 6M 0.5197
7 2014-01-01 9M 0.519511
8 2014-01-01 1Y 0.5198
9 2014-01-01 18M 0.521228
10 2014-01-01 2Y 0.523097
11 2014-01-01 3Y 0.525054
12 2014-01-01 4Y 0.527055
13 2014-01-01 5Y 0.529054
14 2014-01-01 6Y 0.531099
15 2014-01-01 7Y 0.532852
16 2014-01-01 8Y 0.534207
17 2014-01-01 9Y 0.535314
18 2014-01-02 1D 0.517874
19 2014-01-02 1M 0.5181
20 2014-01-02 2M 0.518451
21 2014-01-02 3M 0.5188
22 2014-01-02 4M 0.519113
23 2014-01-02 5M 0.519418
24 2014-01-02 6M 0.5196
25 2014-01-02 9M 0.519377
26 2014-01-02 1Y 0.5197
27 2014-01-02 18M 0.521406
28 2014-01-02 2Y 0.523405
29 2014-01-02 3Y 0.525254
30 2014-01-02 4Y 0.527151
31 2014-01-02 5Y 0.529256
32 2014-01-02 6Y 0.531543
33 2014-01-02 7Y 0.533457
34 2014-01-02 8Y 0.534802
35 2014-01-02 9Y 0.535847
36 2014-01-03 1D 0.518552
37 2014-01-03 1M 0.5186
38 2014-01-03 2M 0.518536
39 2014-01-03 3M 0.5186
40 2014-01-03 4M 0.518865
41 2014-01-03 5M 0.51916
42 2014-01-03 6M 0.5193
43 2014-01-03 9M 0.519024
44 2014-01-03 1Y 0.5193
45 2014-01-03 18M 0.520882
46 2014-01-03 2Y 0.5228
47 2014-01-03 3Y 0.524647
48 2014-01-03 4Y 0.526752
49 2014-01-03 5Y 0.528957
50 2014-01-03 6Y 0.531065
51 2014-01-03 7Y 0.532856
52 2014-01-03 8Y 0.534325
53 2014-01-03 9Y 0.535558
使用pandas pivot会产生以下结果。枢轴工作但行的顺序错误。
df_pivot = df.pivot(index='tenor', columns='date', values='value')
tenor 2014-01-01 2014-01-02 2014-01-03
18M 0.521228 0.521406 0.520882
1D 0.517125 0.517874 0.518552
1M 0.5175 0.5181 0.5186
1Y 0.5198 0.5197 0.5193
2M 0.518159 0.518451 0.518536
2Y 0.523097 0.523405 0.5228
3M 0.5187 0.5188 0.5186
3Y 0.525054 0.525254 0.524647
4M 0.51912 0.519113 0.518865
4Y 0.527055 0.527151 0.526752
5M 0.51949 0.519418 0.51916
5Y 0.529054 0.529256 0.528957
6M 0.5197 0.5196 0.5193
6Y 0.531099 0.531543 0.531065
7Y 0.532852 0.533457 0.532856
8Y 0.534207 0.534802 0.534325
9M 0.519511 0.519377 0.519024
9Y 0.535314 0.535847 0.535558
我希望结果按期限列排序:
tenor 2014-01-01 2014-01-02 2014-01-03
1D 0.517125 0.517874 0.518552
1M 0.5175 0.5181 0.5186
2M 0.518159 0.518451 0.518536
3M 0.5187 0.5188 0.5186
4M 0.51912 0.519113 0.518865
5M 0.51949 0.519418 0.51916
6M 0.5197 0.5196 0.5193
9M 0.519511 0.519377 0.519024
1Y 0.5198 0.5197 0.5193
18M 0.521228 0.521406 0.520882
2Y 0.523097 0.523405 0.5228
3Y 0.525054 0.525254 0.524647
4Y 0.527055 0.527151 0.526752
5Y 0.529054 0.529256 0.528957
6Y 0.531099 0.531543 0.531065
7Y 0.532852 0.533457 0.532856
8Y 0.534207 0.534802 0.534325
9Y 0.535314 0.535847 0.535558
我已经考虑过编写一个自定义排序函数,在比较然后将其与pandas一起使用时将期限值转换为天数(不确定如何)。
我使用google visualization pivot进行了调查,但这似乎只适用于不在现有DataTable上的查询。
非常感谢任何其他建议。
答案 0 :(得分:2)
比较日单位与月份单位是模糊的,例如哪个大:30D或1M?如果这没问题,您可以使用reindex()
方法重新排序DataFrame:
import pandas as pd
df_pivot = df.pivot(index='tenor', columns='date', values='value')
DayCounts = {"D":1, "M":365.0/12, "Y":365}
index = sorted(df_pivot.index, key=lambda v:int(v[:-1])*DayCounts[v[-1]])
df_pivot.reindex(index)
输出:
date 2014-01-01 2014-01-02 2014-01-03
1D 0.517125 0.517874 0.518552
1M 0.517500 0.518100 0.518600
2M 0.518159 0.518451 0.518536
3M 0.518700 0.518800 0.518600
4M 0.519120 0.519113 0.518865
5M 0.519490 0.519418 0.519160
6M 0.519700 0.519600 0.519300
9M 0.519511 0.519377 0.519024
1Y 0.519800 0.519700 0.519300
18M 0.521228 0.521406 0.520882
2Y 0.523097 0.523405 0.522800
3Y 0.525054 0.525254 0.524647
4Y 0.527055 0.527151 0.526752
5Y 0.529054 0.529256 0.528957
6Y 0.531099 0.531543 0.531065
7Y 0.532852 0.533457 0.532856
8Y 0.534207 0.534802 0.534325
9Y 0.535314 0.535847 0.535558