添加重复的顺序值列

时间:2018-09-20 22:20:05

标签: python pandas

我有一个数据框,其中包含堆叠的月度值,看起来像:

      Value    Month
0    0.09187    Jan
1    0.72878    Feb
2    0.92052    Mar
3   -1.86845    Apr
4   -1.16489    May
5   -0.61433    Jun
6    0.68008    Jul
7   -1.50555    Aug
8   -0.18985    Sep
9   -1.11380    Oct
10  -0.63838    Nov
11   0.37527    Dec 
12   0.234216   Jan

我想使用已知范围添加一列年份,以使df看起来像:

     Value     Month   Year
0    0.09187    Jan    1950
1    0.72878    Feb    1950
2    0.92052    Mar    1950
3   -1.86845    Apr    1950
4   -1.16489    May    1950
5   -0.61433    Jun    1950
6    0.68008    Jul    1950
7   -1.50555    Aug    1950
8   -0.18985    Sep    1950
9   -1.11380    Oct    1950
10  -0.63838    Nov    1950
11   0.37527    Dec    1950
12   0.234216   Jan    1951

我尝试初始化年份列表以应用于列:

years = list(range(1950, 2000)
df['Year'] = years * 12

但这产生了

      Value    Month  Year
0    0.09187    Jan   1950
1    0.72878    Feb   1951
2    0.92052    Mar   1952

以此类推。我一直想不出其他任何方法

1 个答案:

答案 0 :(得分:4)

只要您知道自己多年来拥有Jan数据,就可以做到:

df['Year'] = df['Month'].eq('Jan').cumsum()+1949
>>> df
       Value Month  Year
0   0.091870   Jan  1950
1   0.728780   Feb  1950
2   0.920520   Mar  1950
3  -1.868450   Apr  1950
4  -1.164890   May  1950
5  -0.614330   Jun  1950
6   0.680080   Jul  1950
7  -1.505550   Aug  1950
8  -0.189850   Sep  1950
9  -1.113800   Oct  1950
10 -0.638380   Nov  1950
11  0.375270   Dec  1950
12  0.234216   Jan  1951

或者,您可以遵循原始逻辑,但是使用np.repeat

import numpy as np
years = list(range(1950, 2000))
df['Year'] = np.repeat(years,12)

或另一种选择:

df['Year'] = pd.date_range('1950-01-01',periods=len(df),freq='m').year