DataFrame按datatime列关闭过滤

时间:2019-12-30 13:47:07

标签: python pandas dataframe

我已经设置了两个数据帧,并尝试通过将日期时间对象列移动到索引,并使用.last('7D')来提取过去7天内日期时间已“盖章”的条目来过滤结果。它适用于第一个数据框,但不适用于第二个数据框。我尝试了多种变体来过滤df以获得所需的内容,但无法获得准确的输出。我很茫然!这也是迭代构建的,因此,如果您看到一些重构的机会,请告诉我。

原始数据框架: engagements

<class 'pandas.core.frame.DataFrame'> 
RangeIndex: 2572 entries, 0 to 2571 
Data columns (total 15 columns):
REQ_NAME                2572 non-null object
REQ_ID                  2572 non-null object
STATUS                  2572 non-null object
full_name               2572 non-null object
BIZ_UNIT                2572 non-null object
COMPLEXITY              2378 non-null object
PRIORITY                2390 non-null object
OPEN_DATE               2572 non-null datetime64[ns]
REQ_DATE                2572 non-null object
REQ_CAT                 2572 non-null object
REQ_NOTE                2572 non-null object
CostCenter              2572 non-null int64 
TargetCompletionDate    2572 non-null object 
UpdateDTTM              2514 non-null datetime64[ns] 
age                     2572 non-null timedelta64[ns] 
dtypes: datetime64[ns](2), int64(1), object(11), timedelta64[ns](1) 
memory usage: 301.5+ KB 

分隔DataFrame

active_engagements = engagements[engagements['STATUS'].isin(active_status)]
comp_engagements = engagements[engagements['STATUS'].isin(comp_status)]

第一个过滤器

act_eng_open_lw = active engagements.set_index('OPEN_DATE')
act_eng_open_lw = act_eng_open_lw.last('7D')

输出是我希望看到的10行数据

问题子数据框

act_eng_comp_lw = comp_engagements.set_index('UpdateDTTM')
act_eng_comp_lw = act_eng_comp_lw.last('7D')

输出为105行,我希望其中32行

两个过滤的DF上的信息调用act_eng_open_lw

<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 10 entries, 2019-12-20 to 2019-12-26
Data columns (total 14 columns): 
REQ_NAME                10 non-null object 
REQ_ID                  10 non-null object 
STATUS                  10 non-null object 
full_name               10 non-null object 
BIZ_UNIT                10 non-null object 
COMPLEXITY              5 non-null object 
PRIORITY                5 non-null object 
REQ_DATE                10 non-null object 
REQ_CAT                 10 non-null object 
REQ_NOTE                10 non-null object 
CostCenter              10 non-null int64 
TargetCompletionDate    10 non-null object 
UpdateDTTM              5 non-null datetime64[ns] 
age                     10 non-null timedelta64[ns] 
dtypes: datetime64[ns](1), int64(1), object(11), timedelta64[ns](1) 
memory usage: 1.2+ KB  

act_eng_comp_lw

<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 105 entries, 2019-12-26 to 2019-11-27
Data columns (total 14 columns): 
REQ_NAME                105 non-null object 
REQ_ID                  105 non-null object 
STATUS                  105 non-null object 
full_name               105 non-null object 
BIZ_UNIT                105 non-null object 
COMPLEXITY              102 non-null object 
PRIORITY                104 non-null object 
OPEN_DATE               105 non-null datetime64[ns] 
REQ_DATE                105 non-null object 
REQ_CAT                 105 non-null object 
REQ_NOTE                105 non-null object 
CostCenter              105 non-null int64 
TargetCompletionDate    105 non-null object 
age                     105 non-null int64 
dtypes: datetime64[ns](1), int64(2), object(11) 
memory usage: 12.3+ KB 

问题:使用相同的过滤器,为什么一个.last正确过滤了一个Datetime列,而另一个过滤器却不正确?

1 个答案:

答案 0 :(得分:0)

.last相比,我最终改变了过去7天用来捕获的方法:

act_eng_open_lw = act_eng_open_lw[act_eng_open_lw.index > dt.datetime.now() - pd.to_timedelta("7day")]

此方法对我的两个数据框均有效。