按时间计算DataFrame的EWMA

时间:2013-06-19 00:32:26

标签: python pandas

我有这个数据框:

    avg                date    high  low      qty
0 16.92 2013-05-27 00:00:00   19.00 1.22 71151.00
1 14.84 2013-05-30 00:00:00   19.00 1.22 42939.00
2  9.19 2013-06-02 00:00:00   17.20 1.23  5607.00
3 23.63 2013-06-05 00:00:00 5000.00 1.22  5850.00
4 13.82 2013-06-10 00:00:00   19.36 1.22  5644.00
5 17.76 2013-06-15 00:00:00   24.00 2.02 16969.00

每一行都是对在指定日期创建的平均值,高,低和数量的观察。

我正在尝试计算一个跨度为60天的指数移动加权平均值:

df["emwa"] = pandas.ewma(df["avg"],span=60,freq="D")

但是我得到了

TypeError: Only valid with DatetimeIndex or PeriodIndex

好吧,也许我需要在构建时将DataTimeIndex添加到我的DataFrame中。让我从

更改构造函数调用
df = pandas.DataFrame(records) #records is just a list of dictionaries

rng = pandas.date_range(firstDate,lastDate, freq='D')
df = pandas.DataFrame(records,index=rng)

但现在我得到了

ValueError: Shape of passed values is (5,), indices imply (5, 1641601)

有关如何计算我的EMWA的任何建议吗?

2 个答案:

答案 0 :(得分:10)

您需要两件事,确保日期列是日期(而不是字符串),并将索引设置为这些日期。
您可以使用to_datetime

一次完成此操作
In [11]: df.index = pd.to_datetime(df.pop('date'))

In [12]: df
Out[12]:
              avg     high   low    qty
date
2013-05-27  16.92    19.00  1.22  71151
2013-05-30  14.84    19.00  1.22  42939
2013-06-02   9.19    17.20  1.23   5607
2013-06-05  23.63  5000.00  1.22   5850
2013-06-10  13.82    19.36  1.22   5644
2013-06-15  17.76    24.00  2.02  16969

然后您可以按预期调用emwa

In [13]: pd.ewma(df["avg"], span=60, freq="D")
Out[13]:
date
2013-05-27    16.920000
2013-05-28    16.920000
2013-05-29    16.920000
2013-05-30    15.862667
2013-05-31    15.862667
2013-06-01    15.862667
2013-06-02    13.563899
2013-06-03    13.563899
2013-06-04    13.563899
2013-06-05    16.207625
2013-06-06    16.207625
2013-06-07    16.207625
2013-06-08    16.207625
2013-06-09    16.207625
2013-06-10    15.697743
2013-06-11    15.697743
2013-06-12    15.697743
2013-06-13    15.697743
2013-06-14    15.697743
2013-06-15    16.070721
Freq: D, dtype: float64

如果您将其设置为列:

In [14]: df['ewma'] = pd.ewma(df["avg"], span=60, freq="D")

In [15]: df
Out[15]:
              avg     high   low    qty       ewma
date
2013-05-27  16.92    19.00  1.22  71151  16.920000
2013-05-30  14.84    19.00  1.22  42939  15.862667
2013-06-02   9.19    17.20  1.23   5607  13.563899
2013-06-05  23.63  5000.00  1.22   5850  16.207625
2013-06-10  13.82    19.36  1.22   5644  15.697743
2013-06-15  17.76    24.00  2.02  16969  16.070721

答案 1 :(得分:1)

Pandas> 0.17 ewma中已被使用。可以通过组合ewm()mean()

获得相同的功能。

赞:

# Calculating a few means (averages) with exponential components (com = center of mass) 
# on the closing price of the Deutsche Bank stock.

import requests
import zipfile
import io # Python 2, use StringIO
import pandas as pd
import matplotlib

# Set the number of columns to be displayed when printing DataFrames
pd.set_option('max_columns', 7)

# Download file from ipfs
ipfs_file_url = "https://ipfs.io/ipfs/QmW7aSLjePW7S8uE5zbAneGAPdrzdA3MpFkTiFPrRsKS8t"
response = requests.get(ipfs_file_url, stream=True)

# The file is a zipfile to let's read it and parse the csv inside
zf = zipfile.ZipFile(io.BytesIO(response.content)) # Python 2, use StringIO.StringIO
df = pd.read_csv(zf.open('DB_20170627_to_20180627.csv'))

# Oookay, let's begin!
print(df)

# New DataFrame to keep it clean
output = pd.DataFrame()
output['Date'] = df['Date']
output['ewma_com10'] = df['Close'].ewm(com=10).mean()
output['ewma_com50'] = df['Close'].ewm(com=50).mean()
output['ewma_com100'] = df['Close'].ewm(com=100).mean()
print(output)

output.index = pd.to_datetime(output['Date'], format='%Y-%m-%d')
output.plot()

enter image description here

Jupyter笔记本可以在这里找到:pandas_exponential_average.ipynb