在熊猫的特定月份和日期切片

时间:2018-01-23 22:50:34

标签: python pandas indexing time

所以我正在开发一个允许用户选择季节性时间段的功能,并且它可以工作,但我想允许一些额外的功能。

现在我允许用户指定月度周期,并且该函数返回一个新数据框,其中包含指定月份的数据(请参阅代码和示例)。我想要的是允许用户选择开始的月份和日期(例如1月5日 - 3月18日)并且仅选择该范围内的日期。那可能吗?

我的代码如下:

import numpy as np
import pandas as pd

def seasonal_period(merged_dataframe, period):
    """Returns the seasonal period specified for the time series"""
    start = period[0]
    end = period[1]
    merged_dataframe = merged_dataframe.loc[(merged_dataframe.index.month >= start) &
                                                (merged_dataframe.index.month <= end)]
    return merged_dataframe


# Testing the seasonal period with random data
df = pd.DataFrame(np.random.rand(10000, 3), index=pd.date_range('1/1/1980', periods=10000, freq='D'))

# Returns data between Jan and May
print(seasonal_period(merged_dataframe=df, period=[1, 5])) 

打印:

                   0         1         2
1980-01-01  0.788608  0.113614  0.328662
1980-01-02  0.208422  0.974086  0.765795
1980-01-03  0.448420  0.004947  0.184313
1980-01-04  0.400208  0.194078  0.961875
1980-01-05  0.118263  0.406548  0.358848
1980-01-06  0.824994  0.969560  0.892299
1980-01-07  0.140431  0.642784  0.961061
1980-01-08  0.235443  0.236711  0.291453
1980-01-09  0.420899  0.083092  0.277860
1980-01-10  0.185541  0.640260  0.161851
1980-01-11  0.654466  0.742445  0.398733
1980-01-12  0.270931  0.500233  0.121283
1980-01-13  0.590752  0.057112  0.477629
1980-01-14  0.122973  0.997112  0.998513
1980-01-15  0.330342  0.175655  0.240798
1980-01-16  0.559489  0.426027  0.135564
1980-01-17  0.260714  0.493863  0.420336
1980-01-18  0.214587  0.890858  0.097045
1980-01-19  0.243018  0.285315  0.112326
1980-01-20  0.334157  0.630524  0.585468
1980-01-21  0.974340  0.023412  0.349269
1980-01-22  0.435924  0.709390  0.554518
1980-01-23  0.158202  0.288950  0.747733
1980-01-24  0.855350  0.066325  0.796400
1980-01-25  0.482685  0.962369  0.948844
1980-01-26  0.605162  0.185115  0.832465
1980-01-27  0.078977  0.886044  0.823400
1980-01-28  0.062488  0.841581  0.998819
1980-01-29  0.070578  0.836261  0.732075
1980-01-30  0.386692  0.413445  0.524926
...              ...       ...       ...
2007-04-19  0.030180  0.295753  0.696634
2007-04-20  0.246591  0.245117  0.096647
2007-04-21  0.915289  0.264874  0.754863
2007-04-22  0.222286  0.041275  0.922791
2007-04-23  0.389606  0.149993  0.200387
2007-04-24  0.113636  0.923970  0.031243
2007-04-25  0.154459  0.587656  0.508116
2007-04-26  0.525778  0.056525  0.380457
2007-04-27  0.335463  0.343321  0.191828
2007-04-28  0.249183  0.361834  0.327324
2007-04-29  0.994158  0.108749  0.375496
2007-04-30  0.674535  0.527557  0.744897
2007-05-01  0.029355  0.227039  0.418219
2007-05-02  0.946061  0.251699  0.002965
2007-05-03  0.127731  0.479151  0.634638
2007-05-04  0.045522  0.800802  0.170384
2007-05-05  0.514632  0.426107  0.557497
2007-05-06  0.974910  0.757357  0.119415
2007-05-07  0.624626  0.287442  0.211390
2007-05-08  0.408227  0.720328  0.400762
2007-05-09  0.981552  0.399663  0.953638
2007-05-10  0.256625  0.301236  0.832127
2007-05-11  0.513227  0.649790  0.174498
2007-05-12  0.229353  0.089870  0.024055
2007-05-13  0.819985  0.470549  0.388860
2007-05-14  0.640930  0.530929  0.694122
2007-05-15  0.065560  0.084560  0.677467
2007-05-16  0.297165  0.949761  0.483062
2007-05-17  0.405513  0.320957  0.678885
2007-05-18  0.315292  0.773871  0.043010

[4222 rows x 3 columns]

Process finished with exit code 0

有什么建议吗?

1 个答案:

答案 0 :(得分:1)

您可以尝试稍微重写您的功能:

def seasonal_period(merged_dataframe, period):
"""Returns the seasonal period specified for the time series"""
start = period[0]
end = period[1]
merged_dataframe = merged_dataframe.loc[(merged_dataframe.index >= start) &
                                            (merged_dataframe.index <= end)]#[([merged_dataframe.index['1980-01-17':'1980-01-20']])]
return merged_dataframe

修改索引

df = pd.DataFrame(np.random.rand(10000, 3), 
index=pd.date_range('1/1/1980', periods=10000, freq='D'))
df.index = df.index.strftime('%m-%d')

然后打印

 print(seasonal_period(merged_dataframe=df, period=['05-30', '06-02']))

它打印以下内容:

                         0         1         2
             05-30  0.506990  0.000789  0.879022
             05-31  0.521576  0.812470  0.882075
             06-01  0.911531  0.158134  0.943459
             06-02  0.072259  0.254357  0.066428
             05-30  0.060392  0.911165  0.692112
             05-31  0.318079  0.379530  0.924417
             06-01  0.095082  0.864511  0.967509
             06-02  0.899394  0.081380  0.422184
             ...         ...       ...       ...

             06-02  0.460351  0.937928  0.302218
             05-30  0.151066  0.908212  0.039089
             05-31  0.322693  0.056857  0.375615
             06-01  0.851227  0.023046  0.897951
             06-02  0.876524  0.006360  0.181202

            [108 rows x 3 columns]