我有以下数据集。我尝试只保留我给出的某个日期范围内的条目。我们遇到的问题是,当开始日期和结束日期不在我设置日期的日期时,我会使用密钥错误。
Duration Film Deadline
1777 a 02/04/2018
1777 b 02/04/2018
1777 b 02/04/2018
942 b 03/04/2018
941 c 03/04/2018
start_date = sys.argv[1]
end_date = sys.argv[2]
df_filtered = df_filtered.set_index([5])
df_filtered = df_filtered.dropna(axis=0, how='all')
df_range = df_filtered[start_date:end_date]
df_groupby = df_range.groupby([4])[3].sum()
film = df_groupby.index.values.tolist()
footage = df_groupby.values.astype(int).tolist()
代码如上。 有什么想法吗?
答案 0 :(得分:2)
我认为需要转换为DatetimeIndex
列Deadline
:
print (df)
Duration Film Deadline
0 1777 a 01/04/2018
1 1777 b 02/04/2018
2 1777 b 03/04/2018
3 942 b 04/04/2018
4 941 c 05/04/2018
df['Deadline'] = pd.to_datetime(df['Deadline'], dayfirst=True)
start_date= '2018-03-25'
end_date = '2018-04-04'
df = df.set_index('Deadline')[start_date:end_date]
print (df)
Duration Film
Deadline
2018-04-01 1777 a
2018-04-02 1777 b
2018-04-03 1777 b
2018-04-04 942 b
使用between
并按boolean indexing
过滤的另一种解决方案:
df['Deadline'] = pd.to_datetime(df['Deadline'], dayfirst=True)
start_date= '2018-03-25'
end_date = '2018-04-04'
df = df[df['Deadline'].between(start_date, end_date)]
print (df)
Duration Film Deadline
0 1777 a 2018-04-01
1 1777 b 2018-04-02
2 1777 b 2018-04-03
3 942 b 2018-04-04