由于使用pandas DataFrames进行此过滤过程的效果,我感到很困惑。我正在尝试获取介于某些日期之间的行,但结果DataFrame为空。我确定那段时期有数据。
df.info()
返回的'opentime'类型为:'opendate 440383 non-null datetime64 [ns,UTC]
代码段:
from datetime import timedelta
from datetime import datetime
current_date = pd.datetime.now()
t_delta_week = timedelta(days=7)
t_delta_year = timedelta(days=365)
#CurrentDate
date_start2020 = pd.Timestamp(current_date - t_delta_week, unit='ms')
date_end2020 = pd.Timestamp(current_date, unit='ms')
date_start2020 = date_start2020.tz_localize('utc')
date_end2020 = date_end2020.tz_localize('utc')
#LastYearDate
date_start2019 = pd.Timestamp(current_date - t_delta_year - t_delta_week, unit='ms')
date_end2019 = pd.Timestamp(current_date - t_delta_year, unit='ms')
date_start2019 = date_start2019.tz_localize('utc')
date_end2019 = date_end2019.tz_localize('utc')
df2020_2019['opendate'] = pd.to_datetime(df2020_2019['opendate'], unit='ms')
mask = (df2020_2019['opendate'] > date_start2020) & (df2020_2019['opendate'] <= date_end2020)
df_currYear = df2020_2019.loc[mask]
df_currYear
返回的DataFrame为空
感谢您的帮助! :)
编辑:
也许有帮助:'opendate'生成列,并使用以下代码片段创建:
import pandas as pd
fmt = '%Y-%m-%dT%H:%M:%S'
df2020_2019.dropna(subset=['opentime_TS'], inplace=True)
df2020_2019['opendate'] = pd.to_datetime(df2020_2019['opentime_TS'], utc=True, format=fmt, errors='ignore')
此外,我还打印了head()
张
data sample。出于隐私原因,我无法提供df的记录:)
答案 0 :(得分:0)
好,我不好。由于专注于tz TypeErrors,我只是盲目...我已经选择了已经过时的错误数据源:)适用于正确数据的最终解决方案:
from datetime import timedelta
from datetime import datetime
import pandas as pd
fmt = '%Y-%m-%dT%H:%M:%S'
df2020_2019.dropna(subset=['opentime_TS'], inplace=True)
df2020_2019['opendate'] = pd.to_datetime(df2020_2019['opentime_TS'], utc=True, format=fmt, errors='ignore')
df2020_2019.info()
current_date = pd.Timestamp.now()
t_delta_week = timedelta(days=7)
t_delta_year = timedelta(days=365)
#CurrentDate
date_start2020 = pd.Timestamp(current_date - t_delta_week, unit='ms')
date_end2020 = pd.Timestamp(current_date, unit='ms')
date_start2020 = date_start2020.tz_localize('utc')
date_end2020 = date_end2020.tz_localize('utc')
#LastYearDate
date_start2019 = pd.Timestamp(current_date - t_delta_year - t_delta_week, unit='ms')
date_end2019 = pd.Timestamp(current_date - t_delta_year, unit='ms')
date_start2019 = date_start2019.tz_localize('utc')
date_end2019 = date_end2019.tz_localize('utc')
df2020_2019['opendate'] = pd.to_datetime(df2020_2019['opendate'], unit='ms')
df_currYear = df2020_2019[df2020_2019["opendate"] > date_start2020]
df_lastYear = df2020_2019[df2020_2019["opendate"].between(date_start2019, date_end2019)]
df_currYear