多指标转换中的Pandas DatetimeIndex

时间:2016-01-29 17:28:35

标签: python pandas dataframe multi-index datetimeindex

我有一些Pandas(python)数据帧,它们是通过大约每8毫秒收集数据而创建的。数据被分解为块,序列重新开始。所有块都有一个标签,并且有一个时间戳列,指示收集样本的时间(从文件开头)。为了得到一个想法,框架看起来像这样:

|        | EXPINDEX | EXPTIMESTAMP | DATA1 | DATA2 |
-----------------------------------------------------
| BLOCK  | 0        |              |       |       |
| Block1 | 1        | 0            | .423  | .926  |
|        | 2        | 8.215        | .462  | .919  |
|        | 3        | 17.003       | .472  | .904  |
| Block2 | 4        | 55.821       | .243  | .720  |
|        | 5        | 63.521       | .237  | .794  |
| ...    | ...      | ...          | ...   | ...   |
------------------------------------------------------

EXPTIMESTAMP列是DateTimeIndex。我想要做的是稍后将该列保留为实用程序,但使用块相对DateTimeIndex创建不同的子索引,例如:

|        |                | EXPTIMESTAMP | DATA1 | DATA2 |
----------------------------------------------------------
| BLOCK  | BLOCKTIMESTAMP |              |       |       |
| Block1 | 0              | 0            | .423  | .926  |
|        | 8.215          | 8.215        | .462  | .919  |
|        | 17.003         | 17.003       | .472  | .904  |
| Block2 | 0              | 55.821       | .243  | .720  |
|        | 7.700          | 63.521       | .237  | .794  |
| ...    | ...            | ...          | ...   | ...   |
----------------------------------------------------------

我已经完成了这项工作:

blockreltimestamp = []
blocks = list(df.index.levels[0])
for block in blocks:
   dfblock = df.xs(block, level='BLOCK').copy()
   dfblock["InitialVal"] = dfblock.iloc[0]["EXPTIMESTAMP"]
   reltime = dfsblock["EXPTIMESTAMP"] - dfblock["InitialVal"]
   blockreltimestamp.extend(list(reltime))
df["BLOCKTIMESTAMP"] = blockreltimestamp
df.set_index(["BLOCK","BLOCKTIMESTAMP"], drop=False, inplace=True)

但我想知道是否有更清洁/更有效/更多熊猫式的方式来进行这种转型。

谢谢!

1 个答案:

答案 0 :(得分:0)

更干净的解决方案最终处理非多索引数据框,其中BLOCK仍然是具有块ID的列,而EXPTIMESTAMP是一列,正如我最终想要的那样。从那里开始,我使用了熊猫' groupby功能:

initialvalmatrix = df.groupby("BLOCK").min()[["EXPTIMESTAMP"]]

这将创建一个索引为" BLOCK"的数据框,以及一列" EXPTIMESTAMP"包含" EXPTIMESTAMP"的最小值对于每个街区。

为清楚起见,我重命名为" EXPTIMESTAMP"列到" INITIALBYBLOCK":

initialvalmatrix.columns = ["INITIALBYBLOCK"]
然后我用了大熊猫'适用于跨列运行函数来计算" BLOCKTIMESTAMP"柱:

df["BLOCKTIMESTAMP"] = df.apply(apply_zero_timestamp, axis=1, tslookup=initialvalmatrix)
#Keyword arguments, if not used in the apply method, are passed into the function specified.

..." apply_zero_timestamp"功能定义为:

def apply_zero_timestamp(series, tslookup):
    zeroval = series["EXPTIMESTAMP"] - tslookup["INITIALBYBLOCK"][series["BLOCK"]]
    return zeroval

最后,我只需按照自己的意愿设置索引:

df.set_index(["BLOCK","BLOCKTIMESTAMP"], drop=False, inplace=True)

希望它有所帮助!