我的数据如下:
df = pd.DataFrame( np.random.randn(140,13),columns=['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
df['Year']=np.arange(1876,2016)
df.head()
Out[54]:
Year Jan Feb Mar Apr May Jun Jul \
1 1877 -0.341183 -2.369659 -0.301529 1.268756 0.291787 -0.433796 1.846660
2 1878 0.015547 -1.248171 -0.961130 -2.473062 -1.227789 -0.291215 -0.552831
3 1879 -1.643790 0.238561 1.120954 0.273184 -2.255050 0.189526 -0.528215
4 1880 1.800950 0.900657 -1.785493 -0.505400 -0.909594 0.829114 0.310907
Aug Sep Oct Nov Dec
0 -0.540807 1.041048 -0.392727 0.526774 0.482579
1 0.087704 1.520229 0.008850 -0.052644 1.255057
2 0.475701 -0.402313 0.860482 -1.331818 1.248075
3 1.746745 -0.362812 -0.357801 -1.649273 -0.884970
4 1.064974 -2.636122 0.300357 0.523165 1.047123
我想将其转换为索引为年 - 月的单列数据。我尝试堆叠我的原始数据,但它变成了一个时间序列,其中年份与我的值混合。
df=df.stack()
df
Out[60]:
0 Year 1876.000000
Jan -1.375433
Feb 0.115271
Mar 0.160305
Apr 0.962201
May -1.170467
Jun -0.312078
Jul -1.046972
Aug -0.540807
Sep 1.041048
Oct -0.392727
Nov 0.526774
Dec 0.482579
1 Year 1877.000000
Jan -0.341183
...
我真正想要的是:
result=pd.DataFrame(data=np.random.randn(10,1),columns=['values'],index=pd.date_range('1876/1/1',periods=10,freq='BM'))
result.head()
Out[58]:
values
1876-01-31 0.593254
1876-02-29 0.777550
1876-03-31 -1.777443
1876-04-28 -0.880476
1876-05-31 -1.698800
答案 0 :(得分:1)
set_index
至Year
,然后stack
。
# data
# =====================
Year Jan Feb Mar Apr ... Aug Sep Oct Nov Dec
0 1876 1.8309 0.6724 0.6230 0.3548 ... 0.6316 0.7837 -0.0132 -0.3274 -0.0795
1 1877 1.1363 -2.5042 1.8929 -0.2806 ... 2.0662 0.5430 -0.2887 1.2593 0.6788
2 1878 -0.4730 -1.3182 1.2255 1.1420 ... -0.3064 -1.0505 0.8774 -0.7551 1.0743
3 1879 -0.6651 -0.1462 0.5634 1.7074 ... 0.1588 0.8856 -2.9899 -0.2085 0.3358
4 1880 -0.1305 1.2971 -0.6043 -1.1446 ... 0.7274 -0.8798 0.0978 -0.7801 -1.7695
5 1881 0.0165 -0.6090 -0.2994 -0.5597 ... -1.3628 0.6206 1.4357 1.1800 -1.8132
6 1882 -0.3365 -0.0699 -1.2027 -0.4825 ... -0.3016 1.7806 0.9992 -1.4172 0.4250
7 1883 0.7963 -1.1474 0.8532 -0.9619 ... -0.8057 -1.0750 -0.5305 0.3533 -0.0818
.. ... ... ... ... ... ... ... ... ... ... ...
132 2008 -0.0440 -2.2967 -1.0145 0.1504 ... -0.4940 0.2150 0.2712 0.5997 0.2958
133 2009 -0.2410 -0.6169 1.1429 0.1749 ... 0.8128 0.9391 1.1312 -0.0915 1.1761
134 2010 0.8155 0.3567 1.1648 0.7068 ... -0.8204 -0.3549 1.5648 -0.2102 1.6549
135 2011 0.4847 -0.4535 0.5300 -0.8678 ... -0.2837 0.8821 1.1700 0.0899 -0.5830
136 2012 0.1835 0.9730 -0.7666 -1.0301 ... 0.3203 -0.2747 -1.8450 0.0942 0.2149
137 2013 0.2517 0.8293 1.9907 -1.0461 ... -0.3113 0.7177 0.8896 0.2329 2.0546
138 2014 -1.6106 -1.3285 -0.1870 0.2511 ... -0.3264 1.3578 1.5639 -1.3799 -1.1196
139 2015 -2.0050 0.3680 -0.5553 -0.6471 ... 0.6217 -0.0965 1.3019 -1.0420 -1.3107
[140 rows x 13 columns]
# processing
# =================================
df.set_index('Year').stack()
Year
1876 Jan 1.8309
Feb 0.6724
Mar 0.6230
Apr 0.3548
May 1.4329
Jun -0.3263
Jul 1.7276
Aug 0.6316
...
2015 May -0.5075
Jun -1.4982
Jul -1.9434
Aug 0.6217
Sep -0.0965
Oct 1.3019
Nov -1.0420
Dec -1.3107
dtype: float64