Question

我有一个数据框 df ，其中一列是时间戳，另一列是 A 。列 A 包含小数。

我想添加一个新列 B ，并用A的当前值除以前一分钟的A值。那就是：

df['B'] = df['A']_current / df['A'] _(current - 1 min)

注意：数据并非每1分钟就会出现一次，因此＆＃34;提前一分钟＆＃34;表示时间戳最接近的行（当前 - 1分钟）。

我是这样做的：

首先，我使用时间戳作为索引，以便使用 get_loc 并在 df 之后的1分钟内创建一个新的数据帧 new_df 。通过这种方式，当我在数据的第一分钟内提前1分钟查看时，我确信我拥有所有数据。

new_df = df.loc[df['timestamp'] > df.timestamp[0] + delta] # delta = 1 min timedelta

values = []
for index, row n new_df.iterrows():
  v = row.A / df.iloc[df.index.get_loc(row.timestamp-delta,method='nearest')]['A']
  values.append[v]

v_ser = pd.Series(values)
new_df['B'] = v_ser.values

我担心这不是那么好。大型数据帧需要很长时间。另外，我不是100％肯定以上是完全正确的。有时我收到这条消息：

正在尝试在DataFrame的切片副本上设置值。尝试使用.loc [row_indexer，col_indexer] = value而不是

执行上述任务的最佳/最有效方法是什么？谢谢。

PS。如果有人能想到更好的头衔，请告诉我。我写这个标题比发帖花了更长的时间，我还是不喜欢它。

Answer 1

如果DataFrame已被时间戳正确索引，您可以尝试使用.asof（）（如果没有，请先使用.set_index（））。

这里的简单例子

import pandas as pd
import numpy as np
n_vals = 50

# Create a DataFrame with random values and 'unusual times'
df = pd.DataFrame(data = np.random.randint(low=1,high=6, size=n_vals),
                  index=pd.DatetimeIndex(start=pd.Timestamp.now(), 
                                         freq='23s', periods=n_vals), 
                  columns=['value'])

# Demonstrate how to use .asof() to get the value that was the 'state' at 
# the time 1 min since the index. Note the .values call
df['value_one_min_ago'] = df['value'].asof(df.index - pd.Timedelta('1m')).values

# Note that there will be some NaNs to deal with consider .fillna()

Python Pandas：使用较早时间戳

1 个答案: