Question

我对使用.resample（）方法感到有些困惑。我正在使用DateFrame，其中索引是YYYY-MM-DD格式的TimeDate对象，我在一些城市有一行对应于属性成本的列，如下所示：

State       California  Illinois    Pennsylvania    Arizona
RegionName  Los Angeles Chicago     Philadelphia    Phoenix
1/1/2000    204400      136800      52700           111000
2/1/2000    207000      138300      53100           111700
3/1/2000    209800      140100      53200           112800
4/1/2000    212300      141900      53400           113700
5/1/2000    214500      143700      53700           114300
6/1/2000    216600      145300      53800           115100
7/1/2000    219000      146700      53800           115600
8/1/2000    221100      147900      54100           115900
9/1/2000    222800      149000      54500           116500

当我将.resample（）方法应用于它以将显示转换为季度视图时，我得到如下数据排列：

hd = hd.resample('Q').mean()


State       New York    California  Illinois    Pennsylvania    Arizona
RegionName  New York    Los Angeles Chicago     Philadelphia    Phoenix
3/31/2000   NaN         207066.6667 138400      53000           111833.3333
6/30/2000   NaN         214466.6667 143633.3333 53633.33333     114366.6667
9/30/2000   NaN         220966.6667 147866.6667 54133.33333     116000

但是，我需要新创建的索引上的标签以类似于＆＃39; 2000q1＆＃39;的格式显示。风格，而不是本季度的最后一天（或第一天）。我已经遍布pandas文档中的.resample（）方法页面，但对于我的生活，我无法弄清楚如何应用这样的自定义标签。有人可以帮帮我吗？

亲切的问候， Greem

Answer 1

我认为to_period需要strftime：

#hd.index = pd.to_datetime(hd.index)
hd = hd.resample('Q').mean()
hd.index = hd.index.to_period('q').strftime('%Yq%q')
print (hd)
State       California Illinois Pennsylvania Arizona
RegionName Los Angeles  Chicago Philadelphia Phoenix
2000q1          207066   138400        53000  111833
2000q2          214466   143633        53633  114366
2000q3          220966   147866        54133  116000

Answer 2

您可以使用period设置to_period版本的索引，然后执行groupby

df.index = pd.to_datetime(df.index)
df.set_index(df.index.to_period('Q')).groupby(level=0).mean()

State   California Illinois Pennsylvania Arizona
Region Los Angeles  Chicago Philadelphia Phoenix
2000Q1      207066   138400        53000  111833
2000Q2      214466   143633        53633  114366
2000Q3      220966   147866        54133  116000

或者更简洁地了解@ jezrael的答案中包含的strftime

df.groupby(pd.to_datetime(df.index).to_period().strftime('%Yq%q')).mean()

        California Illinois Pennsylvania Arizona
       Los Angeles  Chicago Philadelphia Phoenix
2000q1      207066   138400        53000  111833
2000q2      214466   143633        53633  114366
2000q3      220966   147866        54133  116000

Pandas .resample（）方法 - 自定义标签？

2 个答案: