我有一个带有半小时时间序列索引的熊猫数据框,以及我需要根据等式的日期匹配的一系列日常数据。以下代码在循环中使用.get(),但速度很慢,而且似乎是#34; unpythonic。"
我尝试将系列转换为带有虚拟列的数据框,以尝试合并或查找但由于各种原因我无法使其工作。缺少数据,因此可能存在一些潜在方法的关键错误。
之前回答的问题似乎并不适用。对lambda函数或.asfreq方法有用的人可能会想出一些东西。
import pandas as pd
import numpy as np
# Make a 2 day series
days = 2
dates = pd.date_range('20130102',periods=days)
ts_d = pd.Series(np.random.randn(days),index=dates)
ts_d
# Output
2013-01-02 -1.044139
2013-01-03 -1.061720
Freq: D, dtype: float64
# Make an overlapping 4 day dataframe with 60min index
datetimes = pd.date_range('20130101 00:00',periods=4*24, freq = '60min')
df_t = pd.DataFrame(np.random.randn(4*24,4),index=datetimes,columns=list('ABCD'))
# Begin clunkiness
df_t['date'] = df_t.index.date
for t in df_t.index:
d = df_t.loc[t, 'date']
df_t.loc[t, 'E'] = ts_d.get(d)
df_t
一些输出:
A B C D date E
2013-01-01 20:00:00 -0.173764 -1.440833 -0.163796 0.479593 2013-01-01 None
2013-01-01 21:00:00 1.915522 2.308827 -0.849182 -1.478981 2013-01-01 None
2013-01-01 22:00:00 -0.013391 -1.534994 -2.365495 0.747692 2013-01-01 None
2013-01-01 23:00:00 0.739665 -0.566568 0.413195 0.665017 2013-01-01 None
2013-01-02 00:00:00 -0.358202 -1.625681 0.120250 -1.122430 2013-01-02 -1.044139
2013-01-02 01:00:00 1.048837 -0.328021 0.933473 -0.234328 2013-01-02 -1.044139
2013-01-02 02:00:00 1.178195 -1.389543 -0.144850 -2.430063 2013-01-02 -1.044139
2013-01-02 03:00:00 -0.420962 0.244130 1.819005 -0.982521 2013-01-02 -1.044139
.
.
.
2013-01-02 15:00:00 1.809403 -2.505042 -0.509833 -1.238630 2013-01-02 -1.044139
2013-01-02 16:00:00 0.740123 -0.205582 0.795701 0.459017 2013-01-02 -1.044139
2013-01-02 17:00:00 1.252692 1.025432 -0.235781 -0.506460 2013-01-02 -1.044139
2013-01-02 18:00:00 -1.456726 -1.983843 -1.623061 0.629214 2013-01-02 -1.044139
2013-01-02 19:00:00 1.126687 -0.253415 0.163900 0.059876 2013-01-02 -1.044139
2013-01-02 20:00:00 0.156657 0.066207 0.103946 -0.762910 2013-01-02 -1.044139
2013-01-02 21:00:00 -1.123818 0.314226 -0.281381 0.947381 2013-01-02 -1.044139
2013-01-02 22:00:00 -0.945620 0.538180 1.403452 -0.065406 2013-01-02 -1.044139
2013-01-02 23:00:00 0.059012 2.599817 -0.623826 0.796559 2013-01-02 -1.044139
2013-01-03 00:00:00 0.859748 1.476591 0.607554 -1.575007 2013-01-03 -1.06172
2013-01-03 01:00:00 0.678326 0.084930 0.762786 -1.139595 2013-01-03 -1.06172
2013-01-03 02:00:00 -0.034952 -1.224600 0.317359 -1.620755 2013-01-03 -1.06172
2013-01-03 03:00:00 -1.208597 -1.864493 -0.883250 -0.814249 2013-01-03 -1.06172
2013-01-03 04:00:00 -0.061918 0.461941 0.163563 0.532755 2013-01-03 -1.06172
.
.
.
答案 0 :(得分:3)
你可以虔诚地做到这一点:
首先,获取仅限日期的字段:
df_t['Date'] = pd.to_datetime(df_t.index.date)
设为索引:
df_t = df_t.reset_index().set_index('Date')
设定价格:
df_t['E'] = ts_d
重置旧索引:
df_t = df_t.reset_index().set_index('index')
验证
df_t.ix[pd.to_datetime('20130102')]
*编辑:更改为包含杰夫的建议
答案 1 :(得分:0)
在创建df_t
之后,您至少可以通过分组来保存自己:
df_t.loc[:, 'E'] = None
for k, group in pd.groupby(df_t, df_t.index.date):
df_t.E[group.index] = ts_d.get(k)
由于ts_d
相对较短,因此预计组的数量会非常少,我猜这是相当有效的。