我有一个包含1600个日期的CSV文件,我正在尝试查找所有缺少的日期。例如:
03-10-2019
01-10-2019
2019年9月29日
28-09-2019
应该返回:02-10-2019,30-09-2019。
这就是我写的:
with open('measurements.csv','r') as csvfile:
df = pd.read_csv(csvfile, delimiter=',')
timestamps = df['observation_time'] #Getting only the date
for line in timestamps:
date_str = line
try: # convert string to time
date = date_time_obj = datetime.datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
dates.append(date)
except:
print("Date parsing failed")
dates = pd.DataFrame(dates,columns =['actual_date'])
pd.date_range(start = dates.min(), end = dates.max()).difference(dates.index)
这将返回一个错误
“无法转换输入[actual_date 2018-09-17 22:00:00 dtype: datetime64 [ns]]类型为 时间戳”
答案 0 :(得分:1)
使用想法DataFrame.asfreq
将所有缺少的值添加到DatetimeIndex
,因此可以用boolean indexing
用Series.isna
进行过滤:
df['observation_time'] = pd.to_datetime(df['observation_time'], dayfirst=True)
df1 = df.set_index(df['observation_time']).sort_index().asfreq('d')
print (df1)
observation_time
observation_time
2019-09-28 2019-09-28
2019-09-29 2019-09-29
2019-09-30 NaT
2019-10-01 2019-10-01
2019-10-02 NaT
2019-10-03 2019-10-03
dates = df1.index[df1['observation_time'].isna()]
print (dates )
DatetimeIndex(['2019-09-30', '2019-10-02'], dtype='datetime64[ns]',
name='observation_time', freq=None)