我想创建一个熊猫系列,表明某个日期 - 应该是该系列的索引 - 是否是一个月中的第三个星期五。
我的想法是首先用零作为值创建系列,然后将这些零更改为索引是一个月中第三个星期五的零。这是我的方法似乎有效。
import pandas as pd
import numpy as np
# 8 years of data
dates = pd.date_range("2010-01-01","2017-12-31")
# create series filled with zeroes over those 8 years
mySeries = pd.Series(np.zeros(len(dates)),index=dates)
# iterate over series and change value to 1 if index is a third friday in a month
for index,value in mySeries.iteritems():
if index.weekday() == 4 and 14 < index.day < 22:
mySeries[index] = 1
# if the sum of the series is 96 (1 third friday per month * 12 months * 8 years) then the solution should be correct
print(sum(mySeries))
我当然也有兴趣看到其他更简单的解决方案。
答案 0 :(得分:1)
使用weekday
和day
的非循环更快解决方案,使用int
构造函数将布尔掩码转换为Series
:
dates = pd.date_range("2010-01-01","2017-12-31")
days = dates.day
s1 = pd.Series(((dates.weekday == 4) & (days > 14) & (days < 22)).astype(int), index=dates)
print (s1.sum())
96
print (s1.head())
2010-01-01 0
2010-01-02 0
2010-01-03 0
2010-01-04 0
2010-01-05 0
Freq: D, dtype: int32
<强>计时强>:
In [260]: %%timeit
...: mySeries = pd.Series(np.zeros(len(dates)),index=dates)
...:
...: # iterate over series and change value to 1 if index is a third friday in a month
...: for index,value in mySeries.iteritems():
...: if index.weekday() == 4 and 14 < index.day < 22:
...: mySeries[index] = 1
...:
The slowest run took 5.18 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.68 ms per loop
Compiler time: 0.31 s
In [261]: %%timeit
...: days = dates.day
...: s1 = pd.Series(((dates.weekday == 4) & (days > 14) & (days < 22)).astype(int), index=dates)
...:
1000 loops, best of 3: 603 µs per loop