时间数据帧到单列数据

时间:2015-07-25 11:05:21

标签: pandas dataframe

我的数据如下:

df = pd.DataFrame( np.random.randn(140,13),columns=['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
df['Year']=np.arange(1876,2016)

df.head()
Out[54]: 
   Year       Jan       Feb       Mar       Apr       May       Jun       Jul  \
1  1877 -0.341183 -2.369659 -0.301529  1.268756  0.291787 -0.433796  1.846660   
2  1878  0.015547 -1.248171 -0.961130 -2.473062 -1.227789 -0.291215 -0.552831   
3  1879 -1.643790  0.238561  1.120954  0.273184 -2.255050  0.189526 -0.528215   
4  1880  1.800950  0.900657 -1.785493 -0.505400 -0.909594  0.829114  0.310907   

        Aug       Sep       Oct       Nov       Dec  
0 -0.540807  1.041048 -0.392727  0.526774  0.482579  
1  0.087704  1.520229  0.008850 -0.052644  1.255057  
2  0.475701 -0.402313  0.860482 -1.331818  1.248075  
3  1.746745 -0.362812 -0.357801 -1.649273 -0.884970  
4  1.064974 -2.636122  0.300357  0.523165  1.047123  

我想将其转换为索引为年 - 月的单列数据。我尝试堆叠我的原始数据,但它变成了一个时间序列,其中年份与我的值混合。

df=df.stack()
df
Out[60]: 
0  Year    1876.000000
   Jan       -1.375433
   Feb        0.115271
   Mar        0.160305
   Apr        0.962201
   May       -1.170467
   Jun       -0.312078
   Jul       -1.046972
   Aug       -0.540807
   Sep        1.041048
   Oct       -0.392727
   Nov        0.526774
   Dec        0.482579
1  Year    1877.000000
   Jan       -0.341183
...

我真正想要的是:

result=pd.DataFrame(data=np.random.randn(10,1),columns=['values'],index=pd.date_range('1876/1/1',periods=10,freq='BM'))

result.head()
Out[58]: 
              values
1876-01-31  0.593254
1876-02-29  0.777550
1876-03-31 -1.777443
1876-04-28 -0.880476
1876-05-31 -1.698800

1 个答案:

答案 0 :(得分:1)

首先

set_indexYear,然后stack

# data
# =====================
Year     Jan     Feb     Mar     Apr   ...       Aug     Sep     Oct     Nov     Dec
0    1876  1.8309  0.6724  0.6230  0.3548   ...    0.6316  0.7837 -0.0132 -0.3274 -0.0795
1    1877  1.1363 -2.5042  1.8929 -0.2806   ...    2.0662  0.5430 -0.2887  1.2593  0.6788
2    1878 -0.4730 -1.3182  1.2255  1.1420   ...   -0.3064 -1.0505  0.8774 -0.7551  1.0743
3    1879 -0.6651 -0.1462  0.5634  1.7074   ...    0.1588  0.8856 -2.9899 -0.2085  0.3358
4    1880 -0.1305  1.2971 -0.6043 -1.1446   ...    0.7274 -0.8798  0.0978 -0.7801 -1.7695
5    1881  0.0165 -0.6090 -0.2994 -0.5597   ...   -1.3628  0.6206  1.4357  1.1800 -1.8132
6    1882 -0.3365 -0.0699 -1.2027 -0.4825   ...   -0.3016  1.7806  0.9992 -1.4172  0.4250
7    1883  0.7963 -1.1474  0.8532 -0.9619   ...   -0.8057 -1.0750 -0.5305  0.3533 -0.0818
..    ...     ...     ...     ...     ...   ...       ...     ...     ...     ...     ...
132  2008 -0.0440 -2.2967 -1.0145  0.1504   ...   -0.4940  0.2150  0.2712  0.5997  0.2958
133  2009 -0.2410 -0.6169  1.1429  0.1749   ...    0.8128  0.9391  1.1312 -0.0915  1.1761
134  2010  0.8155  0.3567  1.1648  0.7068   ...   -0.8204 -0.3549  1.5648 -0.2102  1.6549
135  2011  0.4847 -0.4535  0.5300 -0.8678   ...   -0.2837  0.8821  1.1700  0.0899 -0.5830
136  2012  0.1835  0.9730 -0.7666 -1.0301   ...    0.3203 -0.2747 -1.8450  0.0942  0.2149
137  2013  0.2517  0.8293  1.9907 -1.0461   ...   -0.3113  0.7177  0.8896  0.2329  2.0546
138  2014 -1.6106 -1.3285 -0.1870  0.2511   ...   -0.3264  1.3578  1.5639 -1.3799 -1.1196
139  2015 -2.0050  0.3680 -0.5553 -0.6471   ...    0.6217 -0.0965  1.3019 -1.0420 -1.3107

[140 rows x 13 columns]

# processing
# =================================
df.set_index('Year').stack()

Year     
1876  Jan    1.8309
      Feb    0.6724
      Mar    0.6230
      Apr    0.3548
      May    1.4329
      Jun   -0.3263
      Jul    1.7276
      Aug    0.6316
              ...  
2015  May   -0.5075
      Jun   -1.4982
      Jul   -1.9434
      Aug    0.6217
      Sep   -0.0965
      Oct    1.3019
      Nov   -1.0420
      Dec   -1.3107
dtype: float64