以块为单位循环通过Pandas Dataframe

时间:2017-01-28 14:03:46

标签: python pandas dataframe

给出以下数据框

      open    high     low   close    volume
0     74.090  74.144  74.089  74.136  0.000012
1     74.110  74.143  74.009  74.072  0.000419
2     74.074  74.190  74.063  74.081  0.000223
3     74.100  74.244  74.085  74.182  0.000429
4     74.194  74.222  74.164  74.199  0.000090
5     74.198  74.265  74.181  74.213  0.000071
6     74.223  74.244  74.120  74.174  0.000124
7     74.181  74.229  74.132  74.161  0.000087
8     74.164  74.337  74.126  74.324  0.000299
9     74.303  74.407  74.302  74.400  0.000185
10    74.408  74.440  74.373  74.409  0.000163
11    74.437  74.438  74.399  74.418  0.000208
12    74.428  74.464  74.385  74.385  0.000231

如何在整个数据帧中高效循环,并在每一行获取(在新数据帧中)前5行(包括当前行)?

1 个答案:

答案 0 :(得分:5)

如果您想要效率,请使用numpy的步幅

import pandas as pd
import numpy as np
from numpy.lib.stride_tricks import as_strided as stride

sr, sc = v.strides
data = stride(v, (v.shape[1], v.shape[0] - 4, 5), (sc, sr, sr))

pn5 = pd.Panel(data, df.columns, df.index[4:], pd.RangeIndex(5))
df5 = pn5.to_frame()
df5.head(10)

               open    high     low   close    volume
major minor                                          
4     0      74.090  74.144  74.089  74.136  0.000012
      1      74.110  74.143  74.009  74.072  0.000419
      2      74.074  74.190  74.063  74.081  0.000223
      3      74.100  74.244  74.085  74.182  0.000429
      4      74.194  74.222  74.164  74.199  0.000090
5     0      74.110  74.143  74.009  74.072  0.000419
      1      74.074  74.190  74.063  74.081  0.000223
      2      74.100  74.244  74.085  74.182  0.000429
      3      74.194  74.222  74.164  74.199  0.000090
      4      74.198  74.265  74.181  74.213  0.000071

示例处理

def process(df):
    return df.loc[df.name].tail(2)

print(df5.groupby(level=0).apply(process))

               open    high     low   close    volume
major minor                                          
4     3      74.100  74.244  74.085  74.182  0.000429
      4      74.194  74.222  74.164  74.199  0.000090
5     3      74.194  74.222  74.164  74.199  0.000090
      4      74.198  74.265  74.181  74.213  0.000071
6     3      74.198  74.265  74.181  74.213  0.000071
      4      74.223  74.244  74.120  74.174  0.000124
7     3      74.223  74.244  74.120  74.174  0.000124
      4      74.181  74.229  74.132  74.161  0.000087
8     3      74.181  74.229  74.132  74.161  0.000087
      4      74.164  74.337  74.126  74.324  0.000299
9     3      74.164  74.337  74.126  74.324  0.000299
      4      74.303  74.407  74.302  74.400  0.000185
10    3      74.303  74.407  74.302  74.400  0.000185
      4      74.408  74.440  74.373  74.409  0.000163
11    3      74.408  74.440  74.373  74.409  0.000163
      4      74.437  74.438  74.399  74.418  0.000208
12    3      74.437  74.438  74.399  74.418  0.000208
      4      74.428  74.464  74.385  74.385  0.000231

设置

df = pd.DataFrame([
        [74.09, 74.14399999999999, 74.089, 74.13600000000001, 1.2e-05],
        [74.11, 74.143, 74.009, 74.072, 0.00041900000000000005],
        [74.074, 74.19, 74.063, 74.081, 0.000223],
        [74.1, 74.244, 74.085, 74.182, 0.000429],
        [74.194, 74.222, 74.164, 74.199, 9e-05],
        [74.19800000000001, 74.265, 74.181, 74.21300000000001, 7.099999999999999e-05],
        [74.223, 74.244, 74.12, 74.17399999999999, 0.000124],
        [74.181, 74.229, 74.132, 74.161, 8.7e-05],
        [74.164, 74.337, 74.126, 74.324, 0.000299],
        [74.303, 74.407, 74.30199999999999, 74.4, 0.000185],
        [74.408, 74.44, 74.373, 74.40899999999999, 0.00016299999999999998],
        [74.437, 74.438, 74.399, 74.418, 0.00020800000000000001],
        [74.428, 74.464, 74.385, 74.385, 0.000231]
    ], columns=['open', 'high', 'low', 'close', 'volume'])