我有一个熊猫时间序列数据框。 DF
日期是指数。三列,cusip,ticker,factor。
我想每个日期对数据进行十进制。每个日期大约100个因子......每个日期将被分解为1到10个。
作为第一次尝试,无论日期如何,我都试图对整个数据框进行十进制。我用过:
factor = pd.cut(df.factor, 10) #This gave an error:
adj = (mx - mn) * 0.001 # 0.1% of the range
Sybase.Error:(' Layer:2,Origin:4 \ ncs_calc:cslib user api layer:common library error:转换/操作导致溢出。')
数据框有1毫米的行。这是一个尺寸问题吗?纳问题?
三个问题。
感谢您的帮助。 pandas python新手。
示例数据:
df: cusip ticker factor
date
2012-01-05 XXXXX ABC 4.26
2012-01-05 YYYYY BCD -1.25
...(100 more stocks on this date)
2012-01-06 XXXXX ABC 3.25
2012-01-06 YYYYY BCD -1.55
...(100 more stocks on this date)
按照我的意愿输出:
#column with the deciles, lined up with the df.
decile
10
2
...
10
3
...
然后我可以将此附加到我的数据框以获得新列。每个日期都被删除,然后每个数据点在该日期具有相应的十分位数。感谢。
堆栈追踪:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/core/groupby.py", line 1817, in transform res = wrapper(group)
File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/core/groupby.py", line 1807, in <lambda> wrapper = lambda x: func(x, *args, **kwargs) File "<stdin>", line 1, in <lambda> File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/tools/tile.py", line 138, in qcut bins = algos.quantile(x, quantiles)
File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/core/algorithms.py", line 272, in quantile return algos.arrmap_float64(q, _get_score) File "generated.pyx", line 1841, in pandas.algos.arrmap_float64 (pandas/algos.c:71156) File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/core/algorithms.py", line 257, in _get_score idx % 1)
File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/core/algorithms.py", line 279, in _interpolate return a + (b - a) * fraction File "build/bdist.linux-x86_64/egg/Sybase.py", line 246, in _cslib_cb Sybase.Error: ('Layer: 2, Origin: 4\ncs_calc: cslib user api layer: common library error: The conversion/operation resulted in overflow.', <ClientMsgType object at 0x1c4da730>)
答案 0 :(得分:2)
玩具示例。首先制作一个datetime
索引。在这里,我使用两天重复10次制作索引。然后我使用randn
制作一些虚拟数据。
In [1]: date_index = [datetime(2012,01,01)] * 10 + [datetime(2013,01,01)] * 10
In [2]: df = DataFrame({'A':randn(20),'B':randn(20)}, index=date_index)
In [3]: df
Out[3]:
A B
2012-01-01 -1.155124 1.018059
2012-01-01 -0.312090 -1.083568
2012-01-01 0.688247 -1.296995
2012-01-01 -0.205218 0.837194
2012-01-01 0.700611 -0.001015
2012-01-01 1.996796 -0.914564
2012-01-01 -2.268237 0.517232
2012-01-01 -0.170778 -0.143245
2012-01-01 -0.826039 0.581035
2012-01-01 -0.351097 -0.013259
2013-01-01 -0.767911 -0.009232
2013-01-01 -0.322831 -1.384785
2013-01-01 0.300160 0.334018
2013-01-01 -1.406878 -2.275123
2013-01-01 1.722454 0.873262
2013-01-01 0.635711 -1.763352
2013-01-01 -0.816891 -0.451424
2013-01-01 -0.808629 -0.092290
2013-01-01 0.386046 -1.297096
2013-01-01 0.261837 0.562373
如果我正确理解了您的问题,您希望在每个日期内对进行十进制。为此,您可以先将索引作为列移动到数据框中。然后,您可以按新列(此处称为索引)进行分组,并使用带有lambda函数的transform
。下面的lambda函数将pandas.qcut
应用于分组的series
并返回labels
属性。
In [4]: df.reset_index().groupby('index').transform(lambda x: qcut(x,10).labels)
Out[4]:
A B
0 1 9
1 4 1
2 7 0
3 5 8
4 8 5
5 9 2
6 0 6
7 6 3
8 2 7
9 3 4
10 3 6
11 4 2
12 6 7
13 0 0
14 9 9
15 8 1
16 1 4
17 2 5
18 7 3
19 5 8