Pandas-Python日期范围

时间:2018-05-03 11:02:20

标签: python pandas

我有以下数据集。我尝试只保留我给出的某个日期范围内的条目。我们遇到的问题是,当开始日期和结束日期不在我设置日期的日期时,我会使用密钥错误。

Duration    Film    Deadline
1777         a      02/04/2018
1777         b      02/04/2018
1777         b      02/04/2018
942          b      03/04/2018
941          c      03/04/2018


  start_date = sys.argv[1]
  end_date = sys.argv[2]
  df_filtered = df_filtered.set_index([5])
  df_filtered = df_filtered.dropna(axis=0, how='all')
  df_range = df_filtered[start_date:end_date]
  df_groupby = df_range.groupby([4])[3].sum()
  film = df_groupby.index.values.tolist()
  footage = df_groupby.values.astype(int).tolist()

代码如上。 有什么想法吗?

1 个答案:

答案 0 :(得分:2)

我认为需要转换为DatetimeIndexDeadline

print (df)
   Duration Film    Deadline
0      1777    a  01/04/2018
1      1777    b  02/04/2018
2      1777    b  03/04/2018
3       942    b  04/04/2018
4       941    c  05/04/2018
df['Deadline'] = pd.to_datetime(df['Deadline'], dayfirst=True)

start_date= '2018-03-25'
end_date = '2018-04-04'

df = df.set_index('Deadline')[start_date:end_date]
print (df)
            Duration Film
Deadline                 
2018-04-01      1777    a
2018-04-02      1777    b
2018-04-03      1777    b
2018-04-04       942    b

使用between并按boolean indexing过滤的另一种解决方案:

df['Deadline'] = pd.to_datetime(df['Deadline'], dayfirst=True)

start_date= '2018-03-25'
end_date = '2018-04-04'

df = df[df['Deadline'].between(start_date, end_date)]

print (df)
   Duration Film   Deadline
0      1777    a 2018-04-01
1      1777    b 2018-04-02
2      1777    b 2018-04-03
3       942    b 2018-04-04