从熊猫的其他数据框中获取当天的最后价格

时间:2017-12-13 21:54:55

标签: python pandas dataframe

两个数据帧:

数据框“价格”包含分钟定价。

ts                          average
2017-12-13 15:55:00-05:00   339.389
2017-12-13 15:56:00-05:00   339.293
2017-12-13 15:57:00-05:00   339.172
2017-12-13 15:58:00-05:00   339.148
2017-12-13 15:59:00-05:00   339.144

Dataframe'文章'包含文章:

ts                          title
2017-10-25 11:45:00-04:00   Your Evening Briefing
2017-11-24 14:15:00-05:00   Tesla's Grand Designs Distract From Model 3 Bo...
2017-10-26 11:09:00-04:00   UAW Files Claim That Tesla Fired Workers Who S...
2017-10-25 11:42:00-04:00   Forget the Grid of the Future, Puerto Ricans J...
2017-10-22 09:54:00-04:00   Tesla Reaches Deal for Shanghai Facility, WSJ ...

当“文章”发生时,我想要当前的平均股票价格(简单),加上当天结束时的股票价格(问题)。

我目前的做法:

articles['t-eod'] = prices.loc[articles.index.strftime('%Y-%m-%d')[0]].between_time('15:30','15:31')

但是,它会发出警告:

/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

阅读文档并没有让我更清楚。

所以问题:对于每篇文章,我如何才能获得当天价格的最后平均价格?

谢谢!

/莫里斯

1 个答案:

答案 0 :(得分:1)

您可以尝试在idxmax上使用ts来确定该日期的最大时间戳索引,并使用loc

提取平均值
#Reset our index
prices_df.reset_index(inplace=True)
articles_df.reset_index(inplace=True)

#Ensure our ts field is datetime
prices_df['ts'] = pd.to_datetime(prices_df['ts'])
articles_df['ts'] = pd.to_datetime(articles_df['ts'])

#Get maximum average value from price_df by date
df_max = prices_df.loc[prices_df.groupby(prices_df.ts.dt.date, as_index=False).ts.idxmax()]

#We need to join df_max and articles on the date so we make a new index
df_max['date'] = df_max.ts.dt.date
articles_df['date'] = articles_df.ts.dt.date
df_max.set_index('date',inplace=True)
articles_df.set_index('date',inplace=True)

#Set our max field
articles_df['max'] = df_max['average']
articles_df.set_index('ts',inplace=True)