Question

我正在尝试使用pandas数据框根据股票的每日价格计算股票的月收益率。

数据：：

permno          date           prc

Firm A          1995-01-02       30

Firm A          1995-01-03       30.3

...

Firm B          1996-01-03       10.1

到目前为止我已经尝试过的：：

df = DATA
#date columns are consisted with datestamps
df.loc[:, 'month'] = df.loc[:, 'date'].apply(lambda x : x.strftime('%Y%m'))
# **<code1>** choose first date from that month for each permno
df_ = df.sort_values('date').groupby(['permno', 'month']).first().reset_index()
# **<code2>**  caclulate monthly_return by getting pct_change()
df_['monthly_return'] = df_.sort_values('month').groupby('permno').prc.pct_change()

但是，我刚刚发现有些证券在一段时间内没有被交换。

这导致两个问题：

使用会为某些证券选择错误的起点。例如，如果公司B的证券在1997年1月3日没有交易。（假设这是其他证券在1997年1月首次交易的第一天。）选择1997年1月4日。因此，导致对此证券的每月回报
某些证券的交易时间未超过一个月。假设公司B在1998.02年没有交易。〜2001.12 ..然后，使用，我们得到 “ 2002.01的每月回报率” =“（（2002.01价格-1998.01价格）/（1998.01价格）

有没有简单的方法可以处理带有周期跳跃的此类数据？

Answer 1

我认为最方便的方法是删除可能产生误导性回报的价值。

首先创建一个以天为索引的示例数据系列：

$.ajaxSetup({
  'beforeSend': function(xhr) {
    xhr.overrideMimeType('text/html; charset=ISO-8859-1');
  }
});

示例系列如下：问题1：

periods = 10000
my_index = pd.date_range('2016-07-01', periods=periods, freq='D')
data = np.random.randint(100,1000,periods)
ts = pd.Series(data=data, index=my_index, name='Daily Returns')
print(ts.head())

在开始时分配一个nan值，

2016-07-01    348
2016-07-02    794
2016-07-03    650
2016-07-04    365
2016-07-05    291
Freq: D, Name: Monthly Returns, dtype: int64

然后重新采样系列。 “ BMS”代表第一个工作日。 'backfill（）'可以避免前瞻性偏见。

ts.iloc[0]=np.nan

在结果系列中，由于没有数据可计算回报，因此第一个月和第二个月都没有观察到。

ts=ts.resample('BMS').backfill().pct_change().dropna()

问题2：插入更多nan并执行相同操作：

2016-09-01    0.257343
2016-10-03   -0.296997
2016-11-01    0.433544
2016-12-01   -0.552980
2017-01-02   -0.390123
Freq: BMS, Name: Monthly Returns, dtype: float64

它将跳过nan月份及其相关回报。

使用熊猫数据框，股票价格的月度回报随时间跳变

1 个答案: