Question

我想创建一个熊猫系列，表明某个日期 - 应该是该系列的索引 - 是否是一个月中的第三个星期五。

我的想法是首先用零作为值创建系列，然后将这些零更改为索引是一个月中第三个星期五的零。这是我的方法似乎有效。

import pandas as pd
import numpy as np

# 8 years of data
dates = pd.date_range("2010-01-01","2017-12-31")

# create series filled with zeroes over those 8 years
mySeries = pd.Series(np.zeros(len(dates)),index=dates)

# iterate over series and change value to 1 if index is a third friday in a month
for index,value in mySeries.iteritems():
    if index.weekday() == 4 and 14 < index.day < 22:
        mySeries[index] = 1

# if the sum of the series is 96 (1 third friday per month * 12 months * 8 years) then the solution should be correct
print(sum(mySeries))

我当然也有兴趣看到其他更简单的解决方案。

Answer 1

使用weekday和day的非循环更快解决方案，使用int构造函数将布尔掩码转换为Series：

dates = pd.date_range("2010-01-01","2017-12-31")

days = dates.day
s1 = pd.Series(((dates.weekday == 4) & (days > 14) & (days < 22)).astype(int), index=dates)
print (s1.sum())
96

print (s1.head())
2010-01-01    0
2010-01-02    0
2010-01-03    0
2010-01-04    0
2010-01-05    0
Freq: D, dtype: int32

<强>计时：

In [260]: %%timeit
     ...: mySeries = pd.Series(np.zeros(len(dates)),index=dates)
     ...: 
     ...: # iterate over series and change value to 1 if index is a third friday in a month
     ...: for index,value in mySeries.iteritems():
     ...:     if index.weekday() == 4 and 14 < index.day < 22:
     ...:         mySeries[index] = 1
     ...: 
The slowest run took 5.18 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.68 ms per loop
Compiler time: 0.31 s

In [261]: %%timeit
     ...: days = dates.day
     ...: s1 = pd.Series(((dates.weekday == 4) & (days > 14) & (days < 22)).astype(int), index=dates)
     ...: 
1000 loops, best of 3: 603 µs per loop

系列表示一个月内的第三个工作日

1 个答案: