Question

我有两个pandas数据帧。一个包含我通常的测量值（时间索引）。来自不同源的第二帧包含系统状态。它也是时间索引的，但状态数据帧中的时间与我的数据帧的时间与测量值不匹配。我想要实现的是，现在测量数据帧中的每一行还包含在测量时间之前出现在状态数据帧中的最后一个状态。

举个例子，我有一个像这样的状态框架：

                                          state
time                                           
2013-02-14 12:29:37.101000          SystemReset
2013-02-14 12:29:39.103000             WaitFace
2013-02-14 12:29:39.103000      NormalExecution
2013-02-14 12:29:39.166000        GreetVisitors
2013-02-14 12:29:46.879000  AskForParticipation
2013-02-14 12:29:56.807000  IntroduceVernissage
2013-02-14 12:30:07.275000      PictureQuestion

我的测量结果如下：

                            utime
time
2013-02-14 12:29:38.697038      0
2013-02-14 12:29:38.710432      1
2013-02-14 12:29:39.106475      2
2013-02-14 12:29:39.200701      3
2013-02-14 12:29:40.197014      0
2013-02-14 12:29:42.217976      5
2013-02-14 12:29:57.460601      7

我想最终得到这样的数据框：

                            utime                 state
time
2013-02-14 12:29:38.697038      0           SystemReset
2013-02-14 12:29:38.710432      1           SystemReset
2013-02-14 12:29:39.106475      2       NormalExecution
2013-02-14 12:29:39.200701      3         GreetVisitors
2013-02-14 12:29:40.197014      0         GreetVisitors
2013-02-14 12:29:42.217976      5         GreetVisitors
2013-02-14 12:29:57.460601      7   Introducevernissage

我找到了一个非常低效的解决方案：

result = measurements.copy()
stateList = []
for timestamp, _ in measurements.iterrows():
    candidateStates = states.truncate(after=timestamp).tail(1)
    if len(candidateStates) > 0:
        stateList.append(candidateStates['state'].values[0])
    else:
        stateList.append("unknown")

result['state'] = stateList

你认为有什么方法可以优化它吗？

Answer 1

也许像

df = df1.join(df2, how='outer')
df['state'].fillna(method='ffill',inplace=True)
df.dropna()

会奏效吗？ join生成：

>>> df
                                          state  utime
time                                                  
2013-02-14 12:29:37.101000          SystemReset    NaN
2013-02-14 12:29:38.697038                  NaN      0
2013-02-14 12:29:38.710432                  NaN      1
2013-02-14 12:29:39.103000             WaitFace    NaN
2013-02-14 12:29:39.103000      NormalExecution    NaN
2013-02-14 12:29:39.106475                  NaN      2
2013-02-14 12:29:39.166000        GreetVisitors    NaN
2013-02-14 12:29:39.200701                  NaN      3
2013-02-14 12:29:40.197014                  NaN      0
2013-02-14 12:29:42.217976                  NaN      5
2013-02-14 12:29:46.879000  AskForParticipation    NaN
2013-02-14 12:29:56.807000  IntroduceVernissage    NaN
2013-02-14 12:29:57.460601                  NaN      7
2013-02-14 12:30:07.275000      PictureQuestion    NaN

然后我们可以向前填写州列：

>>> df['state'].fillna(method='ffill',inplace=True)
time
2013-02-14 12:29:37.101000            SystemReset
2013-02-14 12:29:38.697038            SystemReset
2013-02-14 12:29:38.710432            SystemReset
2013-02-14 12:29:39.103000               WaitFace
2013-02-14 12:29:39.103000        NormalExecution
2013-02-14 12:29:39.106475        NormalExecution
2013-02-14 12:29:39.166000          GreetVisitors
2013-02-14 12:29:39.200701          GreetVisitors
2013-02-14 12:29:40.197014          GreetVisitors
2013-02-14 12:29:42.217976          GreetVisitors
2013-02-14 12:29:46.879000    AskForParticipation
2013-02-14 12:29:56.807000    IntroduceVernissage
2013-02-14 12:29:57.460601    IntroduceVernissage
2013-02-14 12:30:07.275000        PictureQuestion
Name: state

然后在没有时间的情况下放下行：

>>> df.dropna()
                                          state  utime
time                                                  
2013-02-14 12:29:38.697038          SystemReset      0
2013-02-14 12:29:38.710432          SystemReset      1
2013-02-14 12:29:39.106475      NormalExecution      2
2013-02-14 12:29:39.200701        GreetVisitors      3
2013-02-14 12:29:40.197014        GreetVisitors      0
2013-02-14 12:29:42.217976        GreetVisitors      5
2013-02-14 12:29:57.460601  IntroduceVernissage      7

您可能需要对其进行调整以处理与（可能的多个）状态同时具有utime的情况。可能会drop_duplicates与take_last=True进行此操作。在<与<=问题的早晨咖啡之前，您还必须比我能做的更难思考。

根据时间将来自一个pandas数据帧的条目与第二个pandas数据帧相关联

1 个答案: