Question

我有一个日期索引的数据框（d-m-y）。我想创建一个二进制特征列，表示日期是否是该月的第二个星期六到目前为止我得到的是：

def get_second_true(x):
    second = None
    for index, is_true in enumerate(x):
        if is_true and second is None:
            return index
        if is_true and second is not None:
            second = True

second_saturdays = df.groupby(['month', 'year']).apply(
    lambda x: x.index.weekday == 6
    ).apply(get_second_true)

我无法将其恢复到与原始数据框索引相关的系列中，每个行都有一个标签，表明它是否是第二个星期六。

这感觉就像一个常见的场景，但我无法找到用于做这种事情的术语。我已经查看了unstack和reset_index，但我不能深入了解它们，知道是否可以使用它们，或者甚至根本不需要多级索引。

Answer 1

pandas中有一个特殊的频率，如WOM-2SUN（每周一周：第二个星期日），所以你可以这样做：

In [88]: df = pd.DataFrame({'date':pd.date_range('2000-01-01', periods=365)})

In [89]: df
Out[89]:
          date
0   2000-01-01
1   2000-01-02
2   2000-01-03
3   2000-01-04
4   2000-01-05
5   2000-01-06
6   2000-01-07
7   2000-01-08
8   2000-01-09
9   2000-01-10
..         ...
355 2000-12-21
356 2000-12-22
357 2000-12-23
358 2000-12-24
359 2000-12-25
360 2000-12-26
361 2000-12-27
362 2000-12-28
363 2000-12-29
364 2000-12-30

[365 rows x 1 columns]

In [90]: df.loc[df.date.isin(pd.date_range(start=df.date.min(), end=df.date.max(), freq='WOM-2SUN'))]
Out[90]:
          date
8   2000-01-09
43  2000-02-13
71  2000-03-12
99  2000-04-09
134 2000-05-14
162 2000-06-11
190 2000-07-09
225 2000-08-13
253 2000-09-10
281 2000-10-08
316 2000-11-12
344 2000-12-10

更新：从Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers开始。

Answer 2

如果工作日== 6和蛾的日期＆gt;那天是该月的第二个星期六。 7月和日期＆lt; = 14

在日期中标记某些天数索引的pandas数据帧

2 个答案: