Question

假设我有一个具有以下结构的数据框：

    observation
d1  1
d2  1
d3  -1
d4  -1
d5  -1
d6  -1
d7  1
d8  1
d9  1
d10 1
d11 -1
d12 -1
d13 -1  
d14 -1
d15 -1
d16 1
d17 1
d18 1
d19 1
d20 1

其中d1：d20是某个日期时间索引（在此处概括）。

如果我想将d1：d2，d3：d6，d7：d10等分成他们各自的＆＃34; chunks＆＃34;，我将如何蟒蛇化？

注意：

df1 = df[(df.observation==1)]
df2 = df[(df.observation==-1)]

不是我想要的。

我可以想到蛮力的方式，哪种方法有效，但不是很优雅。

Answer 1

您可以根据cumsum()列diff()的{{1}}创建一个组变量，如果diff（）不等于零，则分配一个True值，因此每次出现新值时，系统都会使用observation创建新的组ID，然后您可以在cumsum()之后使用groupby()应用标准分析，或者将其拆分为较小的数据框df.groupby((df.observation.diff() != 0).cumsum())...(other chained analysis here)：

list-comprehension

这里的索引块：

lst = [g for _, g in df.groupby((df.observation.diff() != 0).cumsum())]

lst[0]
# observation
#d1         1
#d2         1

lst[1]
# observation
#d3        -1
#d4        -1
#d5        -1
#d6        -1
...

Answer 2

以下是使用真实date.datetime个对象作为索引的示例。

import pandas as pd
import numpy as np
import datetime
import random

df = pd.DataFrame({'x': np.random.randn(40)}, index = [date.fromordinal(random.randint(start_date, end_date)) for i in range(40)])

def filter_on_datetime(df, year = None, month = None, day = None):
    if all(d is not None for d in {year, month, day}):
        idxs = [idx for idx in df.index if idx.year == year and idx.month == month and idx.day == day]
    elif year is not None and month is not None and day is None:
        idxs = [idx for idx in df.index if idx.year == year and idx.month == month]
    elif year is not None and month is None and day is None:
        idxs = [idx for idx in df.index if idx.year == year]
    elif year is None and month is not None and day is not None:
        idxs = [idx for idx in df.index if idx.month == month and idx.day == day]
    elif year is None and month is None and day is not None:
        idxs = [idx for idx in df.index if idx.day == day]
    elif year is None and month is not None and day is None:
        idxs = [idx for idx in df.index if idx.month == month]
    elif year is not None and month is None and day is not None:
        idxs = [idx for idx in df.index if idx.year == year and idx.day == day] 
    else:
        idxs = df.index
    return df.ix[idxs]

运行此：

>>> print(filter_on_datetime(df = df, year = 2016, month = 2))
                   x
2016-02-01 -0.141557
2016-02-03  0.162429
2016-02-05  0.703794
2016-02-07 -0.184492
2016-02-09 -0.921793
2016-02-12  1.593838
2016-02-17  2.784899
2016-02-19  0.034721
2016-02-26 -0.142299

将pandas数据帧拆分成许多块

2 个答案: