熊猫-如何比较一列的两个连续行

时间:2019-01-09 19:44:18

标签: python python-3.x pandas dataframe

我想找到一列连续两个行的百分比差异,如果该差异大于10%,我想返回第一个值。 例如,在下面的数据中,我想找到df.close [0]和df.close [1]之间的百分比差,如果该差大于10,那么我想将df.close [0]的值设置为df.close [1]如果差异小于10,那么我想为df.close [0]和df.close [1]保留相同的值,怎么做=?

           1. open  2. high   3. low  4. close  5. volume
date                                                      
2000-01-03  41.7917  42.5000  40.8333   41.2500  2006460.0
2000-01-04  41.0833  41.0833  38.2500   39.2917  3392856.0
2000-01-05  37.2083  37.2083  34.0000   34.5500  4344624.0
2000-01-06  34.5000  36.3333  34.5000   35.6708  2219904.0
2000-01-07  39.1667  43.2500  38.6667   43.2500  7155936.0

我尝试了以下代码,但似乎不起作用:

def percentage_diff(x):
  if (abs((x[0]-x[1]/x[0])*100)>10):
    return x[0]
  else:
    return x[1]

df.close = pd.rolling_apply(df['close'], 2, percentage_diff)

2 个答案:

答案 0 :(得分:0)

对于两个值x[0]之间的百分比差异小于10%的情况,似乎您想用x[1]替换(x[0]-x[1])/x[0])*100的值。不清楚是要返回x还是仅返回x的元素。

def percentage_diff(x):
    if (abs((x[0]-x[1])/x[0])*100) > 10:
        return x #or return x[0] if that is what you really want.
    else:
        x[0] = x[1]
        return x #or return x[1] if that is what you really want.
print(percentage_diff([1,1,3,4,54,9])) #the percentage difference between 1 and 1 is less than 10%
print(percentage_diff([1,2,3,4,54,9])) #the percentage difference between 1 and 2 is more than 10%

以下是上面代码的输出:

>>> [1, 1, 3, 4, 54, 9]
>>> [1, 2, 3, 4, 54, 9]

要将功能应用于pandas DataFrame,您可以这样做:

df['close'] = df.close.apply(percentage_diff)

答案 1 :(得分:0)

通过使用以下功能,我能够解决此问题。

def percentage_diff(x):
  per = (abs((x[0] - x[1]))/x[1] *100)
  if (per > 30):
    return min(x[0], x[1])
  else:
    return x[0]

在我最初的问题中,如果百分比差异大于10,我将返回x [0]或x [1],这只是将值移到了下一行,而并没有真正消除该异常。

def percentage_diff(x):
  if (abs((x[0]-x[1]/x[0])*100)>10):
    return x[0]
  else:
    return x[1]

df.close = pd.rolling_apply(df['close'], 2, percentage_diff)