为什么python不能识别数据集中的月份列?

时间:2016-08-14 00:12:40

标签: python datetime pandas group-by

这就是数据框的样子:

Date  Time (HHMM)         Site  Plot  Replicate  Temperature  \
0   2002-05-01          600  Barre Woods    16          5          4.5
1   2002-05-01          600  Barre Woods    21          7          4.5
2   2002-05-01          600  Barre Woods    31          9          6.5
3   2002-05-01          600  Barre Woods    10          2          5.3
4   2002-05-01          600  Barre Woods     2          1          4.0
5   2002-05-01          600  Barre Woods    13          4          5.5
6   2002-05-01          600  Barre Woods    11          3          5.0
7   2002-05-01          600  Barre Woods    28          8          5.0
8   2002-05-01          600  Barre Woods    18          6          4.5
9   2002-05-01         1400  Barre Woods     2          1         10.3
10  2002-05-01         1400  Barre Woods    31          9          9.0
11  2002-05-01         1400  Barre Woods    13          4         11.0
import pandas as pd
import datetime as dt
from datetime import datetime
df=pd.read_csv('F:/data32.csv',parse_dates=['Date'])
df['Date']=pd.to_datetime(df['Date'],format='%m/%d/%y')

这是我收到错误的地方

df2=df.groupby(pd.TimeGrouper(freq='M'))

错误如下:

  

仅对DatetimeIndex,TimedeltaIndex或PeriodIndex有效,但得到了   'RangeIndex'的实例

2 个答案:

答案 0 :(得分:1)

您可以先使用set_index

dfx = df.set_index('Date')

然后,你可以groupby

dfx.groupby(lambda x : x.month).mean() #just for an example I am using .mean()

答案 1 :(得分:1)

df['Date'].dt.month分组。例如,要计算平均温度,您可以执行以下操作。

import io
import pandas as pd

data = io.StringIO('''\
Date,Time (HHMM),Site,Plot,Replicate,Temperature
0,2002-05-01,600,Barre Woods,16,5,4.5
1,2002-05-01,600,Barre Woods,21,7,4.5
2,2002-05-01,600,Barre Woods,31,9,6.5
3,2002-05-01,600,Barre Woods,10,2,5.3
4,2002-05-01,600,Barre Woods,2,1,4.0
5,2002-05-01,600,Barre Woods,13,4,5.5
6,2002-05-01,600,Barre Woods,11,3,5.0
7,2002-05-01,600,Barre Woods,28,8,5.0
8,2002-05-01,600,Barre Woods,18,6,4.5
9,2002-05-01,1400,Barre Woods,2,1,10.3
10,2002-05-01,1400,Barre Woods,31,9,9.0
11,2002-05-01,1400,Barre Woods,13,4,11.0
''')

df = pd.read_csv(data)
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')

df.groupby(df['Date'].dt.month)['Temperature'].mean()

输出:

Date
5    6.258333
Name: Temperature, dtype: float64