返回缺少的日期Python

时间:2019-10-30 07:21:04

标签: python-3.x pandas dataframe datetime

我有一个包含1600个日期的CSV文件,我正在尝试查找所有缺少的日期。例如:
03-10-2019
01-10-2019
2019年9月29日
28-09-2019
应该返回:02-10-2019,30-09-2019。

这就是我写的:

with open('measurements.csv','r') as csvfile:
df = pd.read_csv(csvfile,  delimiter=',')

timestamps = df['observation_time'] #Getting only the date

for line in timestamps:
date_str = line
try: # convert string to time
    date = date_time_obj = datetime.datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
    dates.append(date) 
except:
    print("Date parsing failed")

dates = pd.DataFrame(dates,columns =['actual_date']) 

pd.date_range(start = dates.min(), end = dates.max()).difference(dates.index)

这将返回一个错误

  

“无法转换输入[actual_date 2018-09-17 22:00:00 dtype:   datetime64 [ns]]类型为   时间戳”

1 个答案:

答案 0 :(得分:1)

使用想法DataFrame.asfreq将所有缺少的值添加到DatetimeIndex,因此可以用boolean indexingSeries.isna进行过滤:

df['observation_time'] = pd.to_datetime(df['observation_time'], dayfirst=True)
df1 = df.set_index(df['observation_time']).sort_index().asfreq('d')
print (df1)
                 observation_time
observation_time                 
2019-09-28             2019-09-28
2019-09-29             2019-09-29
2019-09-30                    NaT
2019-10-01             2019-10-01
2019-10-02                    NaT
2019-10-03             2019-10-03

dates = df1.index[df1['observation_time'].isna()]
print (dates )
DatetimeIndex(['2019-09-30', '2019-10-02'], dtype='datetime64[ns]', 
name='observation_time', freq=None)