pandas查找半小时数据帧索引的每日系列值

时间:2014-07-16 18:33:19

标签: python pandas time-series lookup

我有一个带有半小时时间序列索引的熊猫数据框,以及我需要根据等式的日期匹配的一系列日常数据。以下代码在循环中使用.get(),但速度很慢,而且似乎是#34; unpythonic。"

我尝试将系列转换为带有虚拟列的数据框,以尝试合并或查找但由于各种原因我无法使其工作。缺少数据,因此可能存在一些潜在方法的关键错误。

之前回答的问题似乎并不适用。对lambda函数或.asfreq方法有用的人可能会想出一些东西。

import pandas as pd
import numpy as np

# Make a 2 day series
days = 2
dates = pd.date_range('20130102',periods=days)
ts_d = pd.Series(np.random.randn(days),index=dates)
ts_d

# Output

2013-01-02   -1.044139
2013-01-03   -1.061720
Freq: D, dtype: float64

# Make an overlapping 4 day dataframe with 60min index
datetimes = pd.date_range('20130101 00:00',periods=4*24, freq = '60min')
df_t = pd.DataFrame(np.random.randn(4*24,4),index=datetimes,columns=list('ABCD'))

# Begin clunkiness
df_t['date'] = df_t.index.date
for t in df_t.index:
    d = df_t.loc[t, 'date']
    df_t.loc[t, 'E'] = ts_d.get(d)
df_t

一些输出:

                         A          B           C           D          date          E
2013-01-01 20:00:00 -0.173764   -1.440833   -0.163796    0.479593    2013-01-01  None
2013-01-01 21:00:00  1.915522    2.308827   -0.849182   -1.478981    2013-01-01  None
2013-01-01 22:00:00 -0.013391   -1.534994   -2.365495    0.747692    2013-01-01  None
2013-01-01 23:00:00  0.739665   -0.566568    0.413195    0.665017    2013-01-01  None
2013-01-02 00:00:00 -0.358202   -1.625681    0.120250   -1.122430    2013-01-02 -1.044139
2013-01-02 01:00:00  1.048837   -0.328021    0.933473   -0.234328    2013-01-02 -1.044139
2013-01-02 02:00:00  1.178195   -1.389543   -0.144850   -2.430063    2013-01-02 -1.044139
2013-01-02 03:00:00 -0.420962    0.244130    1.819005   -0.982521    2013-01-02 -1.044139
.
.
.
2013-01-02 15:00:00  1.809403   -2.505042   -0.509833   -1.238630    2013-01-02 -1.044139
2013-01-02 16:00:00  0.740123   -0.205582    0.795701    0.459017    2013-01-02 -1.044139
2013-01-02 17:00:00  1.252692    1.025432   -0.235781   -0.506460    2013-01-02 -1.044139
2013-01-02 18:00:00 -1.456726   -1.983843   -1.623061    0.629214    2013-01-02 -1.044139
2013-01-02 19:00:00  1.126687   -0.253415    0.163900    0.059876    2013-01-02 -1.044139
2013-01-02 20:00:00  0.156657    0.066207    0.103946   -0.762910    2013-01-02 -1.044139
2013-01-02 21:00:00 -1.123818    0.314226   -0.281381    0.947381    2013-01-02 -1.044139
2013-01-02 22:00:00 -0.945620    0.538180    1.403452   -0.065406    2013-01-02 -1.044139
2013-01-02 23:00:00  0.059012    2.599817   -0.623826    0.796559    2013-01-02 -1.044139
2013-01-03 00:00:00  0.859748    1.476591    0.607554   -1.575007    2013-01-03  -1.06172
2013-01-03 01:00:00  0.678326    0.084930    0.762786   -1.139595    2013-01-03  -1.06172
2013-01-03 02:00:00 -0.034952   -1.224600    0.317359   -1.620755    2013-01-03  -1.06172
2013-01-03 03:00:00 -1.208597   -1.864493   -0.883250   -0.814249    2013-01-03  -1.06172
2013-01-03 04:00:00 -0.061918    0.461941    0.163563    0.532755    2013-01-03  -1.06172
.
.
.

2 个答案:

答案 0 :(得分:3)

你可以虔诚地做到这一点:

首先,获取仅限日期的字段:

df_t['Date'] = pd.to_datetime(df_t.index.date)

设为索引:

df_t = df_t.reset_index().set_index('Date')

设定价格:

df_t['E'] = ts_d

重置旧索引:

df_t = df_t.reset_index().set_index('index')

验证

df_t.ix[pd.to_datetime('20130102')]

*编辑:更改为包含杰夫的建议

答案 1 :(得分:0)

在创建df_t之后,您至少可以通过分组来保存自己:

df_t.loc[:, 'E'] = None
for k, group in pd.groupby(df_t, df_t.index.date):
    df_t.E[group.index] = ts_d.get(k)

由于ts_d相对较短,因此预计组的数量会非常少,我猜这是相当有效的。