我想绘制这个数据框:
Date TradeSize
0 2013-04-17 0.780000
1 2013-04-20 0.034000
2 2013-04-23 21.972500
3 2013-05-28 0.021000
4 2013-06-16 11.000000
5 2013-06-19 0.013000
6 2013-07-01 9.021000
7 2013-07-13 0.150000
8 2013-09-01 6.000000
9 2013-09-04 0.008000
10 2013-09-16 0.082000
11 2013-09-17 0.010000
12 2013-09-21 0.161000
13 2013-09-22 1.000000
14 2013-09-23 1.000000
15 2013-09-24 1.119000
16 2013-09-28 1.000000
17 2013-12-17 3.000000
18 2013-12-18 1.500000
19 2014-01-11 1.170000
20 2014-01-14 0.000100
21 2014-01-25 4.000000
22 2014-01-26 0.060000
23 2014-01-28 2.029900
24 2014-02-22 0.089900
25 2014-03-02 8.000000
26 2014-03-18 0.008000
27 2014-03-31 0.000100
28 2014-04-05 0.052000
29 2014-04-19 0.122000
30 2014-04-20 0.027000
31 2014-04-21 0.000100
32 2014-04-22 0.001100
33 2014-04-27 0.100000
34 2014-04-29 0.039000
35 2014-05-05 3.521000
36 2014-05-07 0.000105
37 2014-05-11 0.000100
38 2014-05-14 0.000100
39 2014-06-15 0.000800
40 2014-06-21 0.000500
41 2014-06-24 0.000600
42 2014-06-28 0.000400
43 2014-07-14 0.135000
44 2014-07-15 0.002300
45 2014-07-21 300.000000
46 2014-07-22 10.000000
47 2014-08-09 2.000000
48 2014-08-23 19.000000
49 2014-09-13 2.000000
但我应该对数据应用限制,这是为了美化情节,
如果下一行TradeSize Value不在今天TradeSize值的+ -10%范围内,则应替换为今日TradeSize值和下一行TradeSize值的平均值;澄清看这个例子:
Date TradeSize
1 2013-04-20 0.034000
2 2013-04-23 21.972500
索引2的值大于索引1值的+ 10%,因此索引2的值应该替换为这两个索引的平均值,依此类推。 如果值也是-10%,它应该做同样的事情!
答案 0 :(得分:1)
如果我理解正确,明天'意味着下一行?
首先计算+ -10%值:
min_v = (df['TradeSize'] * 0.9).shift() #shift to next row
max_v = (df['TradeSize'] * 1.1).shift()
df = df.assign(min_v=min_v, max_v=max_v)
获得平均值:
df = df.assign(avg=(df['TradeSize']+df['TradeSize'].shift())/2.)
制作复制结果列(用于绘图):
df = df.assign(res=df['TradeSize'].copy())
找到+ -10%并将其替换为平均结果:
not_in_range_bool = (df['TradeSize'] < df['min_v']) | (df['TradeSize'] > df['max_v'])
not_in_range_bool[0] = False #first row can not be calculate, set it to False
df.loc[not_in_range_bool, 'res'] = df.loc[not_in_range_bool, 'avg']
现在你可以使用df [&#39; res&#39;]来美化情节