我在Pandas中有一个数据框。
其中一列是时间戳。我将使用以下内容从数据中删除所有周末:
df = df[df['TIMESTAMP'].apply(pd.datetime.weekday)<5]
代码需要9秒才能运行。有更快的方法吗?
提前致谢。
答案 0 :(得分:2)
更快的替代方法是首先将系列转换为DatetimeIndex
(具有weekday
属性):
df[pd.DatetimeIndex(df['TIMESTAMP']).weekday < 5]
答案 1 :(得分:2)
为了完整......
In [1]: df = DataFrame(randn(100000,2),columns=list('AB'))
In [6]: df['time'] = date_range('19700101',periods=100000)
In [7]: df.tail()
Out[7]:
A B time
99995 0.481596 -0.622861 2243-10-12 00:00:00
99996 -1.000646 0.415413 2243-10-13 00:00:00
99997 0.054219 -0.669477 2243-10-14 00:00:00
99998 -1.246848 0.690656 2243-10-15 00:00:00
99999 -2.186820 -0.597221 2243-10-16 00:00:00
In [8]: df.head()
Out[8]:
A B time
0 -0.011530 -0.609354 1970-01-01 00:00:00
1 0.652302 -0.229030 1970-01-02 00:00:00
2 -1.703967 0.880957 1970-01-03 00:00:00
3 2.000682 -1.250603 1970-01-04 00:00:00
4 0.483412 2.233786 1970-01-05 00:00:00
In [10]: pd.DatetimeIndex(df.time).weekday
Out[10]: array([3, 4, 5, ..., 5, 6, 0], dtype=int32)
In [11]: df[pd.DatetimeIndex(df.time).weekday<5]
Out[11]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 71428 entries, 0 to 99999
Data columns (total 3 columns):
A 71428 non-null values
B 71428 non-null values
time 71428 non-null values
dtypes: datetime64[ns](1), float64(2)
In [12]: df[pd.DatetimeIndex(df.time).weekday<5].head()
Out[12]:
A B time
0 -0.011530 -0.609354 1970-01-01 00:00:00
1 0.652302 -0.229030 1970-01-02 00:00:00
4 0.483412 2.233786 1970-01-05 00:00:00
5 0.264460 -0.135544 1970-01-06 00:00:00
6 0.037285 0.592312 1970-01-07 00:00:00
In [13]: %timeit df[pd.DatetimeIndex(df.time).weekday<5]
10 loops, best of 3: 41.4 ms per loop