熊猫推拉窗

时间:2018-06-24 11:49:04

标签: python pandas

我有以下按日期时间索引的导入的熊猫DataFrame:

                             VAL
           DATETIME    
2012-01-02 02:00:00    3.375000
2012-01-02 02:01:00    3.281667
2012-01-02 02:02:00    3.426667
2012-01-02 02:03:00    3.378333
2012-01-02 02:04:00    3.381667
2012-01-02 02:05:00    3.831667
....

我需要按如下方式转换DataFrame:

                            VAL        VAL1        VAL2
           DATETIME    
2012-01-02 02:00:00    3.375000    3.281667    3.426667
2012-01-02 02:01:00    3.281667    3.426667    3.378333
2012-01-02 02:02:00    3.426667    3.378333    3.381667
2012-01-02 02:03:00    3.378333    3.381667    3.831667
...

是否有内置功能或实现此目的的有效方法?

2 个答案:

答案 0 :(得分:4)

使用Series.shift和循环来分配多个新列:

for x in range(1, 3):
    df['VAL{}'.format(x)] = df['VAL'].shift(-x)

如果需要移动一分钟:

for x in range(1, 3):
    df['VAL{}'.format(x)] = df['VAL'].shift(-x, freq='T')

print (df)
                          VAL      VAL1      VAL2
DATETIME                                         
2012-01-02 02:00:00  3.375000  3.281667  3.426667
2012-01-02 02:01:00  3.281667  3.426667  3.378333
2012-01-02 02:02:00  3.426667  3.378333  3.381667
2012-01-02 02:03:00  3.378333  3.381667  3.831667
2012-01-02 02:04:00  3.381667  3.831667       NaN
2012-01-02 02:05:00  3.831667       NaN       NaN

最后必要时删除最后NaN的行:

#N > 1
N = 3
for x in range(1, N):
    df['VAL{}'.format(x)] = df['VAL'].shift(-x, freq='T')

df = df.iloc[:-N + 1]
print (df)
                          VAL      VAL1      VAL2
DATETIME                                         
2012-01-02 02:00:00  3.375000  3.281667  3.426667
2012-01-02 02:01:00  3.281667  3.426667  3.378333
2012-01-02 02:02:00  3.426667  3.378333  3.381667
2012-01-02 02:03:00  3.378333  3.381667  3.831667

答案 1 :(得分:2)

您可以使用NumPy stride_tricks

// https://github.com/aspnet/Security/blob/release/2.0/src/Microsoft.AspNetCore.Authentication/RemoteAuthenticationOptions.cs#L26
options.CorrelationCookie = new Http.CookieBuilder()
{
    Name = "my_correlation_cookie",
    HttpOnly = true,
    SameSite = SameSiteMode.None,
    SecurePolicy = CookieSecurePolicy.SameAsRequest,
    Expiration = new TimeSpan(0, 15, 0)
};

// https://github.com/aspnet/Security/blob/release/2.0/src/Microsoft.AspNetCore.Authentication.OpenIdConnect/OpenIdConnectOptions.cs#L71
options.NonceCookie = new Http.CookieBuilder()
{
    Name = "my_nonce_cookie",
    HttpOnly = true,
    SameSite = SameSiteMode.None,
    SecurePolicy = CookieSecurePolicy.SameAsRequest,
    Expiration = new TimeSpan(0, 15, 0)
};

收益

import numpy as np
import numpy.lib.stride_tricks as stride
import pandas as pd
df = pd.DataFrame({'DATETIME': ['2012-01-02 02:00:00', '2012-01-02 02:01:00', '2012-01-02 02:02:00', '2012-01-02 02:03:00', '2012-01-02 02:04:00', '2012-01-02 02:05:00'], 'VAL': [3.375, 3.2816669999999997, 3.4266669999999997, 3.378333, 3.3816669999999998, 3.831667]})
df['DATETIME']  = pd.to_datetime(df['DATETIME'])
df = df.set_index('DATETIME')


stride = df['VAL'].values.strides[0]
ncols = 3
nrows = len(df)-ncols+1
arr = stride.as_strided(df['VAL'], shape=(nrows, ncols), strides=(stride, stride))

result = pd.DataFrame(arr.copy(), columns=['VAL{}'.format(i) for i in range(1, ncols+1)],
                      index=df.index[:nrows])

VAL1 VAL2 VAL3 DATETIME 2012-01-02 02:00:00 3.375000 3.281667 3.426667 2012-01-02 02:01:00 3.281667 3.426667 3.378333 2012-01-02 02:02:00 3.426667 3.378333 3.381667 2012-01-02 02:03:00 3.378333 3.381667 3.831667 是制作滑动窗口的关键。它说 strides=(stride,stride)stride.as_strided中每个位置的下一个值 右边(即在下一列中)距离result个字节,每个值都向下 (即下一行)也只有stride个字节。定义 stride中的值取自基础数组result


尽管arr.copy()可以非常快速地生成所需的数组, 有一些与使用有关的警告。请参见下文和Notes on the doc page。 通过复制数组可以完全缓解这些警告-即使用stride_tricks而不是arr.copy()本身。 另一方面,复制阵列(尤其是大阵列)会降低性能。


请注意,如果您使用arr而不是pd.DataFrame(arr), 那么DataFrame中的值就是pd.DataFrame(arr.copy()) view 。尽管这样做可以节省内存,但也意味着修改df['VAL']中的一个值可以在多个位置更改该值。例如,

result

如果希望每个值都独立,请使用result = pd.DataFrame(arr, columns=['VAL{}'.format(i) for i in range(1, ncols+1)], index=df.index[:nrows]) In [30]: result.iloc[1,1] = 100 In [27]: result Out[27]: VAL1 VAL2 VAL3 DATETIME 2012-01-02 02:00:00 3.375000 3.281667 100.000000 2012-01-02 02:01:00 3.281667 100.000000 3.378333 2012-01-02 02:02:00 100.000000 3.378333 3.381667 2012-01-02 02:03:00 3.378333 3.381667 3.831667