在熊猫中选择日期的任意子集

时间:2018-07-14 10:44:07

标签: python pandas date

有没有办法在Pandas数据框中选择任意日期子集?这样,如果我有以下内容:

dates = pd.date_range('20130101', periods=6)    
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401
2013-01-06 -0.673690  0.113648 -1.478427  0.524988

我想做的事,例如:

my_selection = ['2013-01-01', '2013-01-03', '2013-01-02', '2013-01-02', '2013-01-05']]
my_df = df.loc[my_selection]
my_df
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

1 个答案:

答案 0 :(得分:2)

需要将日期转换为datetime才能匹配DatetimeIndex

my_selection = ['2013-01-01', '2013-01-03', '2013-01-02', '2013-01-02', '2013-01-05']
my_df = df.loc[pd.to_datetime(my_selection)]
print (my_df)
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

如果可能,某些日期不在DatetimeIndex中:

my_selection = ['2013-01-21', '2013-01-03', '2013-01-02', '2013-01-02', '2013-01-05']

my_df = df.loc[pd.to_datetime(my_selection)]
  

将类似列表的标签传递给.loc或[]且标签缺失   将来会出现KeyError,您可以使用.reindex()作为替代。

my_df = df.reindex(pd.to_datetime(my_selection))
print (my_df)
                   A         B         C         D
2013-01-21       NaN       NaN       NaN       NaN
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

或将intersectionDatetimeIndex一起使用:

my_df = df.loc[df.index.intersection(pd.to_datetime(my_selection))]
print (my_df)
                   A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-05 -0.424972  0.567020  0.276232 -1.087401