需要根据数据框中的行号应用不同的公式

时间:2018-07-16 06:56:51

标签: python python-3.x pandas dataframe

我正在努力寻找数据帧中的某种移动平均值。该公式将根据要为其计算的行数而变化。实际情况是我需要计算Z列的地方。

编辑2:

以下是我正在使用的实际数据

          Date     Open     High      Low    Close
0   01-01-2018  1763.95  1763.95  1725.00  1731.35
1   02-01-2018  1736.20  1745.80  1725.00  1743.20
2   03-01-2018  1741.10  1780.00  1740.10  1774.60
3   04-01-2018  1779.95  1808.00  1770.00  1801.35
4   05-01-2018  1801.10  1820.40  1795.60  1809.95
5   08-01-2018  1816.00  1827.95  1800.00  1825.00
6   09-01-2018  1823.00  1835.00  1793.90  1812.05
7   10-01-2018  1812.05  1823.00  1801.40  1816.55
8   11-01-2018  1825.00  1825.05  1798.55  1802.10
9   12-01-2018  1805.00  1820.00  1794.00  1804.95
10  15-01-2018  1809.90  1834.45  1792.45  1830.00
11  16-01-2018  1835.00  1857.45  1826.10  1850.25
12  17-01-2018  1850.00  1852.45  1826.20  1840.50
13  18-01-2018  1840.50  1852.00  1823.50  1839.00
14  19-01-2018  1828.25  1836.35  1811.00  1829.50
15  22-01-2018  1816.50  1832.55  1805.50  1827.20
16  23-01-2018  1825.00  1825.00  1782.25  1790.15
17  24-01-2018  1787.80  1792.70  1732.15  1737.50
18  25-01-2018  1739.90  1753.40  1720.00  1726.40
19  29-01-2018  1735.15  1754.95  1729.80  1738.70

我正在使用的代码段如下:

from datetime import date
from nsepy import get_history
import csv
import pandas as pd
import numpy as np
import requests
from datetime import timedelta
import datetime as dt
import pandas_datareader.data as web
import io

df = pd.read_csv('ACC.CSV')

idx = df.reset_index().index

df['Change'] = df['Close'].diff()
df['Advance'] = np.where(df.Change > 0, df.Change,0)
df['Decline'] = np.where(df.Change < 0, df.Change*-1, 0)
conditions = [idx < 14, idx == 14, idx > 14]
values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)
df['Avg_Loss'] = (df.Decline.rolling(14).sum())/14
df['RS'] = df.Avg_Gain / df.Avg_Loss
df['RSI'] = np.where(df['Avg_Loss'] == 0, 100, 100-(100/(1+df.RS)))
df.drop(['Change', 'Advance', 'Decline', 'Avg_Gain', 'Avg_Loss', 'RS'],     axis=1)

print(df.head(20))

以下是我得到的错误:

Traceback (most recent call last):
  File "C:/Users/Lenovo/Desktop/Python/0.Chart Patterns/Z.Sample Code.py", line 20, in <module>
    values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'Avg_Gain'

编辑3:以下是预期的输出,然后我还将写下公式。原始DF由“日期开盘高低和收盘价”列组成。

enter image description here

Advance and Decline – 
•   If the difference between current & previous is +ve then Advance = difference and Decline = 0
•   If the difference between current & previous is –ve then Advance = 0 and     Decline = -1 * difference
Avg_Gain:
•   If index is < 13 then Avg_Gain = 0
•   If index = 13 then Avg_Gain = Average of 14 periods
•   If index > 13, then Avg_Gain = (Avg_Gain(previous-row) * 13 +     Advance(current-row) )/14
Avg_Loss:
•   If index is < 13 then Avg_Loss = 0
•   If index = 13 then Avg_Loss = Average of Advance of 14 periods
•   If index > 13, then Avg_Loss = (Avg_Loss(previous-row) * 13 + Decline(current-row) )/14
RS:
•   If index < 13 then RS = 0
•   If Index >= 13 then RS = Avg_Gain/Avg_Loss
RSI = 100-(100/(1 + RS))

我希望这会有所帮助。

2 个答案:

答案 0 :(得分:2)

您的代码有误,因为您在创建df.Avg_Gain时使用了df.Avg_Gain。是

values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)

我将代码的这一部分更改为以下内容:

up = df.Advance.rolling(14).sum()/14
values = [0, up, (up.shift(1) * 13 + df.Advance)/14]

输出(idx> = 14):

    Date        Open    High    Low     Close   RSI
14  2018-01-19  1828.25 1836.35 1811.00 1829.50 75.237850
15  2018-01-22  1816.50 1832.55 1805.50 1827.20 72.920021
16  2018-01-23  1825.00 1825.00 1782.25 1790.15 58.793750
17  2018-01-24  1787.80 1792.70 1732.15 1737.50 40.573938
18  2018-01-25  1739.90 1753.40 1720.00 1726.40 31.900045
19  2018-01-29  1735.15 1754.95 1729.80 1738.70 33.197678

尽管如此,应该有更好的方法。如果我找到一个更好的解决方案,我将对其进行更新。让我知道此数据是否正确。

更新: 您还需要更正“ Avg_loss”的计算::

down = df.Decline.rolling(14).sum()/14
down_values = [0, down, (down.shift(1) * 13 + df.Decline)/14]
df['Avg_Loss'] = np.select(conditions, down_values)

http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:relative_strength_index_rsi#calculation

更新2 :在提供了预期数据之后。 因此,我唯一可以做到这一点的方法就是循环-不确定是否可以这样做-也许我不知道某些熊猫功能。

因此,首先要像以前一样设置Avg_GainAvg_Loss: 您只需要稍作更改即可:

conditions = [idx<13, idx==13, idx>13]
up = df.Advance.rolling(14).sum()/14
values = [0, up, 0]
df['Avg_Gain'] = np.select(conditions, values)

down = df.Decline.rolling(14).sum()/14
down_values = [0, down, 0]
df['Avg_Loss'] = np.select(conditions, d_values)

我已更改您的条件以对索引13进行分割-因为这是基于预期输出的结果。

运行此代码后,将使用先前的Avg_GainAvg_Loss的值,从索引14填充Agv_GainAvg_Loss的值。 / p>

p=14
for i in range(p, len(df)):
    df.at[i, 'Avg_Gain'] = ((df.loc[i-1, 'Avg_Gain'] * (p-1)) + df.loc[i, 'Advance']) / p
    df.at[i, 'Avg_Loss'] = ((df.loc[i-1, 'Avg_Loss'] * (p-1)) + df.loc[i, 'Decline']) / p

输出:

df[13:][['Date','Avg_Gain', 'Avg_Loss', 'RS', 'RSI']]

    Date        Avg_Gain    Avg_Loss    RS          RSI
13  2018-01-18  10.450000   2.760714    3.785252    79.102460
14  2018-01-19  9.703571    3.242092    2.992997    74.956155
15  2018-01-22  9.010459    3.174800    2.838119    73.945571
16  2018-01-23  8.366855    5.594457    1.495562    59.928860
17  2018-01-24  7.769222    8.955567    0.867530    46.453335
18  2018-01-25  7.214278    9.108741    0.792017    44.196960
19  2018-01-29  7.577544    8.458116    0.895890    47.254330

答案 1 :(得分:2)

您可以使用ewm进行操作,也可以查看there以获得更多说明。您会发现ewm可用于计算 y(row)=α* x(row)+(1-α)* y(row-1),例如 y 是列Avg_Gain, x 是Advance列的值,α是赋予 x(row)的权重

# define the number for the window
win_n = 14
# create a dataframe df_avg with the non null value of the two columns 
# Advance and Decline such as the first is the mean of the 14 first values and the rest as normal
df_avg = (pd.DataFrame({'Avg_Gain': np.append(df.Advance[:win_n].mean(), df.Advance[win_n:]), 
                        'Avg_Loss': np.append(df.Decline[:win_n].mean(), df.Decline[win_n:])},   
                        df.index[win_n-1:])
               .ewm(adjust=False, alpha=1./win_n).mean()) # what you need to calculate with your formula

# create the two other columns RS and RSI
df_avg['RS'] = df_avg.Avg_Gain / df_avg.Avg_Loss
df_avg['RSI'] = 100.-(100./(1. + df_avg['RS']))

df_avg如下:

     Avg_Gain  Avg_Loss        RS        RSI
13  10.450000  2.760714  3.785252  79.102460
14   9.703571  3.242092  2.992997  74.956155
15   9.010459  3.174800  2.838119  73.945571
16   8.366855  5.594457  1.495562  59.928860
17   7.769222  8.955567  0.867530  46.453335
18   7.214278  9.108741  0.792017  44.196960
19   7.577544  8.458116  0.895890  47.254330

您可以join将其保存为原始数据,并将fillna设置为0:

df = df.join(df_avg).fillna(0)