我有一个数据框,其中包含堆叠的月度值,看起来像:
Value Month
0 0.09187 Jan
1 0.72878 Feb
2 0.92052 Mar
3 -1.86845 Apr
4 -1.16489 May
5 -0.61433 Jun
6 0.68008 Jul
7 -1.50555 Aug
8 -0.18985 Sep
9 -1.11380 Oct
10 -0.63838 Nov
11 0.37527 Dec
12 0.234216 Jan
我想使用已知范围添加一列年份,以使df看起来像:
Value Month Year
0 0.09187 Jan 1950
1 0.72878 Feb 1950
2 0.92052 Mar 1950
3 -1.86845 Apr 1950
4 -1.16489 May 1950
5 -0.61433 Jun 1950
6 0.68008 Jul 1950
7 -1.50555 Aug 1950
8 -0.18985 Sep 1950
9 -1.11380 Oct 1950
10 -0.63838 Nov 1950
11 0.37527 Dec 1950
12 0.234216 Jan 1951
我尝试初始化年份列表以应用于列:
years = list(range(1950, 2000)
df['Year'] = years * 12
但这产生了
Value Month Year
0 0.09187 Jan 1950
1 0.72878 Feb 1951
2 0.92052 Mar 1952
以此类推。我一直想不出其他任何方法
答案 0 :(得分:4)
只要您知道自己多年来拥有Jan
数据,就可以做到:
df['Year'] = df['Month'].eq('Jan').cumsum()+1949
>>> df
Value Month Year
0 0.091870 Jan 1950
1 0.728780 Feb 1950
2 0.920520 Mar 1950
3 -1.868450 Apr 1950
4 -1.164890 May 1950
5 -0.614330 Jun 1950
6 0.680080 Jul 1950
7 -1.505550 Aug 1950
8 -0.189850 Sep 1950
9 -1.113800 Oct 1950
10 -0.638380 Nov 1950
11 0.375270 Dec 1950
12 0.234216 Jan 1951
或者,您可以遵循原始逻辑,但是使用np.repeat
:
import numpy as np
years = list(range(1950, 2000))
df['Year'] = np.repeat(years,12)
或另一种选择:
df['Year'] = pd.date_range('1950-01-01',periods=len(df),freq='m').year