我有以下按日期时间索引的导入的熊猫DataFrame:
VAL
DATETIME
2012-01-02 02:00:00 3.375000
2012-01-02 02:01:00 3.281667
2012-01-02 02:02:00 3.426667
2012-01-02 02:03:00 3.378333
2012-01-02 02:04:00 3.381667
2012-01-02 02:05:00 3.831667
....
我需要按如下方式转换DataFrame:
VAL VAL1 VAL2
DATETIME
2012-01-02 02:00:00 3.375000 3.281667 3.426667
2012-01-02 02:01:00 3.281667 3.426667 3.378333
2012-01-02 02:02:00 3.426667 3.378333 3.381667
2012-01-02 02:03:00 3.378333 3.381667 3.831667
...
是否有内置功能或实现此目的的有效方法?
答案 0 :(得分:4)
使用Series.shift
和循环来分配多个新列:
for x in range(1, 3):
df['VAL{}'.format(x)] = df['VAL'].shift(-x)
如果需要移动一分钟:
for x in range(1, 3):
df['VAL{}'.format(x)] = df['VAL'].shift(-x, freq='T')
print (df)
VAL VAL1 VAL2
DATETIME
2012-01-02 02:00:00 3.375000 3.281667 3.426667
2012-01-02 02:01:00 3.281667 3.426667 3.378333
2012-01-02 02:02:00 3.426667 3.378333 3.381667
2012-01-02 02:03:00 3.378333 3.381667 3.831667
2012-01-02 02:04:00 3.381667 3.831667 NaN
2012-01-02 02:05:00 3.831667 NaN NaN
最后必要时删除最后NaN
的行:
#N > 1
N = 3
for x in range(1, N):
df['VAL{}'.format(x)] = df['VAL'].shift(-x, freq='T')
df = df.iloc[:-N + 1]
print (df)
VAL VAL1 VAL2
DATETIME
2012-01-02 02:00:00 3.375000 3.281667 3.426667
2012-01-02 02:01:00 3.281667 3.426667 3.378333
2012-01-02 02:02:00 3.426667 3.378333 3.381667
2012-01-02 02:03:00 3.378333 3.381667 3.831667
答案 1 :(得分:2)
您可以使用NumPy stride_tricks
:
// https://github.com/aspnet/Security/blob/release/2.0/src/Microsoft.AspNetCore.Authentication/RemoteAuthenticationOptions.cs#L26
options.CorrelationCookie = new Http.CookieBuilder()
{
Name = "my_correlation_cookie",
HttpOnly = true,
SameSite = SameSiteMode.None,
SecurePolicy = CookieSecurePolicy.SameAsRequest,
Expiration = new TimeSpan(0, 15, 0)
};
// https://github.com/aspnet/Security/blob/release/2.0/src/Microsoft.AspNetCore.Authentication.OpenIdConnect/OpenIdConnectOptions.cs#L71
options.NonceCookie = new Http.CookieBuilder()
{
Name = "my_nonce_cookie",
HttpOnly = true,
SameSite = SameSiteMode.None,
SecurePolicy = CookieSecurePolicy.SameAsRequest,
Expiration = new TimeSpan(0, 15, 0)
};
收益
import numpy as np
import numpy.lib.stride_tricks as stride
import pandas as pd
df = pd.DataFrame({'DATETIME': ['2012-01-02 02:00:00', '2012-01-02 02:01:00', '2012-01-02 02:02:00', '2012-01-02 02:03:00', '2012-01-02 02:04:00', '2012-01-02 02:05:00'], 'VAL': [3.375, 3.2816669999999997, 3.4266669999999997, 3.378333, 3.3816669999999998, 3.831667]})
df['DATETIME'] = pd.to_datetime(df['DATETIME'])
df = df.set_index('DATETIME')
stride = df['VAL'].values.strides[0]
ncols = 3
nrows = len(df)-ncols+1
arr = stride.as_strided(df['VAL'], shape=(nrows, ncols), strides=(stride, stride))
result = pd.DataFrame(arr.copy(), columns=['VAL{}'.format(i) for i in range(1, ncols+1)],
index=df.index[:nrows])
VAL1 VAL2 VAL3
DATETIME
2012-01-02 02:00:00 3.375000 3.281667 3.426667
2012-01-02 02:01:00 3.281667 3.426667 3.378333
2012-01-02 02:02:00 3.426667 3.378333 3.381667
2012-01-02 02:03:00 3.378333 3.381667 3.831667
是制作滑动窗口的关键。它说
strides=(stride,stride)
在stride.as_strided
中每个位置的下一个值
右边(即在下一列中)距离result
个字节,每个值都向下
(即下一行)也只有stride
个字节。定义
stride
中的值取自基础数组result
。
尽管arr.copy()
可以非常快速地生成所需的数组,
有一些与使用有关的警告。请参见下文和Notes on the doc page。
通过复制数组可以完全缓解这些警告-即使用stride_tricks
而不是arr.copy()
本身。
另一方面,复制阵列(尤其是大阵列)会降低性能。
请注意,如果您使用arr
而不是pd.DataFrame(arr)
,
那么DataFrame中的值就是pd.DataFrame(arr.copy())
的 view 。尽管这样做可以节省内存,但也意味着修改df['VAL']
中的一个值可以在多个位置更改该值。例如,
result
如果希望每个值都独立,请使用result = pd.DataFrame(arr, columns=['VAL{}'.format(i) for i in range(1, ncols+1)],
index=df.index[:nrows])
In [30]: result.iloc[1,1] = 100
In [27]: result
Out[27]:
VAL1 VAL2 VAL3
DATETIME
2012-01-02 02:00:00 3.375000 3.281667 100.000000
2012-01-02 02:01:00 3.281667 100.000000 3.378333
2012-01-02 02:02:00 100.000000 3.378333 3.381667
2012-01-02 02:03:00 3.378333 3.381667 3.831667
。