使用Pandas从数据集中获取季节

时间:2018-06-27 21:51:21

标签: python pandas

给出以下数据集:

"";"M_001";"M_002";"M_003";"M_004"
"2011-01-01 00:00:00";4,45;3,5467;3,197;12,098
"2011-02-01 00:00:00";18,40;0,124;174,36;11,098
"2011-03-01 00:00:00";25,789;27,67;19,76;34,66
"2011-04-01 00:00:00";19,08;11,078;23,34;67,45
"2011-05-01 00:00:00";13,06;06,078;10,34;21,45
"2011-06-01 00:00:00";13,06;06,078;10,34;21,45
"2011-06-21 00:00:00";13,06;06,078;10,34;21,45
"2011-07-01 00:00:00";9,06;06,078;9,34;21,45
"2011-07-14 00:00:00";9,06;06,078;9,34;21,45
"2011-08-01 00:00:00";22,06;45,078;21,34;21,45
"2011-08-11 00:00:00";22,06;45,078;21,34;21,45
"2011-08-12 00:00:00";22,06;45,078;21,34;21,45
"2011-09-01 00:00:00";76,06;32,078;10,34;21,45
"2011-09-23 00:00:00";76,06;32,078;10,34;21,45
"2011-09-25 00:00:00";76,06;32,078;10,34;21,45
"2011-10-01 00:00:00";17,06;18,078;108,34;21,45
"2011-11-01 00:00:00";12,06;45,078;107,34;21,45
"2011-12-01 00:00:00";7,06;60,078;83,34;21,45
"2011-12-21 00:00:00";7,06;60,078;83,34;21,45
"2012-01-01 00:00:00";4,45;3,5467;3,197;12,098
"2012-02-01 00:00:00";18,40;0,124;174,36;11,098
"2012-03-01 00:00:00";25,789;27,67;19,76;34,66
"2012-03-11 00:00:00";25,789;27,67;19,76;34,66
"2012-03-20 00:00:00";25,789;27,67;19,76;34,66
"2012-03-30 00:00:00";25,789;27,67;19,76;34,66

谁能告诉我如何修改函数calc()从数据集中选择行,以便我可以分别获取有关两个冬季(12月21日至3月20日)的行 和夏季(6月21日至9月23日)来自read_csv?

我已经尝试编写此代码,但是效果不佳。

import pandas as pd 

def calc():
    filename = 'mydataset/dataset.csv'
    mySeries = pd.read_csv(filename, header=0, index_col=0, parse_dates=[0], sep=";", decimal=",")

    return mySeries

if __name__ == '__main__':
    df = calc()
    print("Winter season measures: ")
    print(df.iloc[[x in range(12, 3) for x in df.index.month]])
    print("Winter season measures: ")
    print(df.iloc[[x in range(6, 10) for x in df.index.month]])

提前谢谢!

1 个答案:

答案 0 :(得分:0)

我在这里重新创建了您的DF:

from io import StringIO
import pandas as pd 
text = StringIO('''"";"M_001";"M_002";"M_003";"M_004"
"2011-01-01 00:00:00";4,45;3,5467;3,197;12,098
"2011-02-01 00:00:00";18,40;0,124;174,36;11,098
"2011-03-01 00:00:00";25,789;27,67;19,76;34,66
"2011-04-01 00:00:00";19,08;11,078;23,34;67,45
"2011-05-01 00:00:00";13,06;06,078;10,34;21,45
"2011-06-01 00:00:00";13,06;06,078;10,34;21,45
"2011-06-21 00:00:00";13,06;06,078;10,34;21,45
"2011-07-01 00:00:00";9,06;06,078;9,34;21,45
"2011-07-14 00:00:00";9,06;06,078;9,34;21,45
"2011-08-01 00:00:00";22,06;45,078;21,34;21,45
"2011-08-11 00:00:00";22,06;45,078;21,34;21,45
"2011-08-12 00:00:00";22,06;45,078;21,34;21,45
"2011-09-01 00:00:00";76,06;32,078;10,34;21,45
"2011-09-23 00:00:00";76,06;32,078;10,34;21,45
"2011-09-25 00:00:00";76,06;32,078;10,34;21,45
"2011-10-01 00:00:00";17,06;18,078;108,34;21,45
"2011-11-01 00:00:00";12,06;45,078;107,34;21,45
"2011-12-01 00:00:00";7,06;60,078;83,34;21,45
"2011-12-21 00:00:00";7,06;60,078;83,34;21,45
"2012-01-01 00:00:00";4,45;3,5467;3,197;12,098
"2012-02-01 00:00:00";18,40;0,124;174,36;11,098
"2012-03-01 00:00:00";25,789;27,67;19,76;34,66
"2012-03-11 00:00:00";25,789;27,67;19,76;34,66
"2012-03-20 00:00:00";25,789;27,67;19,76;34,66
"2012-03-30 00:00:00";25,789;27,67;19,76;34,66''')
df = pd.read_csv(filepath_or_buffer=text, sep=';', header=0, index_col=0, decimal=',', parse_dates=[0])

然后,我编写了一些代码,该代码创建了两个新的数据框,并附加了冬季和夏季范围内的所有月份。 编辑:注释掉了旧版本,保留在下面。

winterStart = '-12-21'
winterEnd   = '-03-20'
summerStart = '-06-21'
summerEnd   = '-09-23'

#df_winter = df.ix[str('2010'+winterStart):str('2011'+winterEnd)]
#df_winter = df_winter.append(df.ix['2011'+winterStart:'2012'+winterEnd])
#df_winter = df_winter.append(df.ix['2012'+winterStart:'2013'+winterEnd])

#df_summer = df.ix['2010'+summerStart:'2010'+summerEnd]
#df_summer = df_summer.append(df.ix['2011'+summerStart:'2011'+summerEnd])
#df_summer = df_summer.append(df.ix['2012'+summerStart:'2012'+summerEnd])

如果您有更多的年份,则可以创建一个循环,该循环遍历随后的每一年并附加该年的季节性数据。 编辑:OP要求此功能。添加了一个循环以获取所有年份,而无需指定每个季节的每年。提到df.ix []的另一条评论已弃用,因此我将代码更改为使用df.loc []而不是以前版本中的df.ix []。

df_winter = pd.DataFrame()
for year in range(2010, 2015):
    df_winter = df_winter.append(df.loc[str(year) + winterStart : str(year+1) + winterEnd]) 
    # used year and year+1 because winter season spans from an initial year to the next year.
print(df_winter)

df_summer = pd.DataFrame()
for year in range(2010, 2015):
    df_summer = df_summer.append(df.loc[str(year) + summerStart : str(year) + summerEnd])
print(df_summer)

当您将日期作为索引时,还请参见Filtering Pandas DataFrames on dates进行日期范围之间的过滤。