纠正数据并替换大熊猫中的新数据

时间:2017-11-19 14:21:25

标签: python pandas

我想绘制这个数据框:

          Date   TradeSize
 0  2013-04-17    0.780000
 1  2013-04-20    0.034000
 2  2013-04-23   21.972500
 3  2013-05-28    0.021000
 4  2013-06-16   11.000000
 5  2013-06-19    0.013000
 6  2013-07-01    9.021000
 7  2013-07-13    0.150000
 8  2013-09-01    6.000000
 9  2013-09-04    0.008000
 10 2013-09-16    0.082000
 11 2013-09-17    0.010000
 12 2013-09-21    0.161000
 13 2013-09-22    1.000000
 14 2013-09-23    1.000000
 15 2013-09-24    1.119000
 16 2013-09-28    1.000000
 17 2013-12-17    3.000000
 18 2013-12-18    1.500000
 19 2014-01-11    1.170000
 20 2014-01-14    0.000100
 21 2014-01-25    4.000000
 22 2014-01-26    0.060000
 23 2014-01-28    2.029900
 24 2014-02-22    0.089900
 25 2014-03-02    8.000000
 26 2014-03-18    0.008000
 27 2014-03-31    0.000100
 28 2014-04-05    0.052000
 29 2014-04-19    0.122000
 30 2014-04-20    0.027000
 31 2014-04-21    0.000100
 32 2014-04-22    0.001100
 33 2014-04-27    0.100000
 34 2014-04-29    0.039000
 35 2014-05-05    3.521000
 36 2014-05-07    0.000105
 37 2014-05-11    0.000100
 38 2014-05-14    0.000100
 39 2014-06-15    0.000800
 40 2014-06-21    0.000500
 41 2014-06-24    0.000600
 42 2014-06-28    0.000400
 43 2014-07-14    0.135000
 44 2014-07-15    0.002300
 45 2014-07-21  300.000000
 46 2014-07-22   10.000000
 47 2014-08-09    2.000000
 48 2014-08-23   19.000000
 49 2014-09-13    2.000000

但我应该对数据应用限制,这是为了美化情节,

如果下一行TradeSize Value不在今天TradeSize值的+ -10%范围内,则应替换为今日TradeSize值和下一行TradeSize值的平均值;澄清看这个例子:

     Date         TradeSize
 1  2013-04-20    0.034000
 2  2013-04-23   21.972500

索引2的值大于索引1值的+ 10%,因此索引2的值应该替换为这两个索引的平均值,依此类推。 如果值也是-10%,它应该做同样的事情!

1 个答案:

答案 0 :(得分:1)

如果我理解正确,明天'意味着下一行?

首先计算+ -10%值:

min_v = (df['TradeSize'] * 0.9).shift() #shift to next row
max_v = (df['TradeSize'] * 1.1).shift()
df = df.assign(min_v=min_v, max_v=max_v)

获得平均值:

df = df.assign(avg=(df['TradeSize']+df['TradeSize'].shift())/2.)

制作复制结果列(用于绘图):

df = df.assign(res=df['TradeSize'].copy())

找到+ -10%并将其替换为平均结果:

not_in_range_bool = (df['TradeSize'] < df['min_v']) | (df['TradeSize'] > df['max_v'])
not_in_range_bool[0] = False #first row can not be calculate, set it to False
df.loc[not_in_range_bool, 'res'] = df.loc[not_in_range_bool, 'avg']

现在你可以使用df [&#39; res&#39;]来美化情节