Question

我有一个数据帧，其中包含几年的电力负荷数据。我要选择每年具有最大值的行，以及该日前后的+/- 5天。

一年中的最大值：

                 Max
2000-12-14    2009.347900
2001-02-22    1987.976074

所需结果：

                     Max
    2000-12-09    1949.279175
    2000-12-10    1901.194702
    2000-12-11    1827.509155
    2000-12-12    1579.835205
    2000-12-13    1780.223267
    2000-12-14    2009.347900
    2000-12-15    1845.129395
    2000-12-16    1795.377319
    2000-12-17    1741.817749
    2000-12-18    1747.508789
    2000-12-19    1800.817261
    2001-02-17    1703.080322
    2001-02-18    1792.888062
    2001-02-19    1777.731323
    2001-02-20    1700.863281
    2001-02-21    1624.189209
    2001-02-22    1987.976074
    2001-02-23    1898.503052
    2001-02-24    1809.863403
    2001-02-25    1660.542725
    2001-02-26    1792.182007
    2001-02-27    1770.865356

我正在使用df.loc[df.groupby("Year")['Max'].idxmax()].Max获取年份的最大值，但是如何选择所有相邻的行？

Answer 1

import pandas as pd
import numpy as np
df = pd.DataFrame(index = pd.date_range('01-01-2001','01-01-2002',freq='D'),
                  data = {'power':100*np.random.random(366)})
df.loc[df.power.idxmax() - np.timedelta64(5,'D'):df.power.idxmax() + np.timedelta64(5,'D')]

loc方法可以将范围作为参数。 Numpy timedelta用于增加和减少索引天数。

编辑：如果要在最大行的两侧看到5行，则无论时间增量如何，请重置数据帧的索引。索引现在将是整数，您可以通过以下方式在最大值的两侧获得5行：

#reset index to list of integers
df = df.reset_index()
# index using integers
df.loc[df.power.idxmax() -5: df.power.idxmax()+5]

大熊猫：选择所选行的相邻行

1 个答案: