熊猫函数将行数据转置为列

时间:2019-08-26 21:29:41

标签: python pandas

我在pandas数据框中有一个数据表,该数据表中的每一年都是行,每个月都是列。

0  Year   Jan    Feb    Mar   Apr   May    Jun    Jul   Aug    Sep    Oct    Nov    Dec
1  1876  11.3   11.0    0.2   9.4   6.8   17.2   -5.6  12.3   10.5   -8.0   -2.7   -3.0
2  1877  -9.7   -6.5   -4.7  -9.6   3.6  -16.8  -10.2  -8.2  -17.2  -16.0  -12.6  -12.6
3  1878  -8.7  -21.1  -15.5  -8.8   2.1   -3.1   15.9  13.0   17.7   10.9   15.1   17.9
4  1879  12.7   14.3   13.2  12.7   2.1   16.4   21.8  22.6   18.9   15.2    9.8   -5.5
5  1880  10.8    7.7   14.3   5.3  12.3    9.1    1.6  14.3    8.1    4.8    7.2   -1.9

我希望转置数据以将年份保留为列,将月份添加为列

我曾尝试过融化和枢转,但还没有完全完成。

import urllib.request as request
from contextlib import closing
import shutil
import pandas as pd
from datetime import datetime
import pickle

def prepare_enso_data():
    """ get the raw enso data and prepare for use in bokeh figures
    elsewhere.
     """
    # get latest data from bom website
    with closing(request.urlopen('ftp://ftp.bom.gov.au/anon/home/ncc/www/sco/soi/soiplaintext.html')) as r:
        with open('.\\enso\\data\\enso_bom_historical.txt', 'wb') as enso_file:
            shutil.copyfileobj(r, enso_file)

        # now strip unwanted html
        with open('.\\enso\\data\\enso_bom_historical.txt', 'r') as enso_file:
            for i in range(11):
                next(enso_file)
            # remove unwanted characters and html at end of file
            enso_list = [
                x.replace('b','').replace('\n','').replace('Fe', "Feb").split() for x in enso_file if '<' not in x]

        enso_df = pd.DataFrame(enso_list)

        # set the first row as column names
        header = enso_df.loc[0]
        enso_df = enso_df[1:]
        enso_df.columns = header

        print(enso_df.head())

        enso_df_m = enso_df.melt(
            id_vars=['Year'], 
            # value_vars=[], 
            var_name='Month')

我希望它看起来像这样:

0   Year    Month   Value
1   1876    Jan 11.3
2   1876    Feb 11
3   1876    Mar 0.2
4   1876    Apr 9.4
5   1876    May 6.8
6   1876    Jun 17.2
7   1876    Jul -5.6
8   1876    Aug 12.3
9   1876    Sep 10.5
10  1876    Oct -8
11  1876    Nov -2.7
12  1876    Dec -3

1 个答案:

答案 0 :(得分:2)

IIUC,您需要这个:

import calendar
df_out=df.melt('Year', var_name='Month')
df_out['Month'] = pd.Categorical(df_out['Month'], calendar.month_abbr[1:], ordered=True)
df_out.sort_values(['Year','Month'])

输出:

    Year Month  value
0   1876   Jan   11.3
5   1876   Feb   11.0
10  1876   Mar    0.2
15  1876   Apr    9.4
20  1876   May    6.8
25  1876   Jun   17.2
30  1876   Jul   -5.6
35  1876   Aug   12.3
40  1876   Sep   10.5
45  1876   Oct   -8.0
50  1876   Nov   -2.7
55  1876   Dec   -3.0
1   1877   Jan   -9.7
6   1877   Feb   -6.5
11  1877   Mar   -4.7
16  1877   Apr   -9.6
21  1877   May    3.6
26  1877   Jun  -16.8
31  1877   Jul  -10.2
36  1877   Aug   -8.2
41  1877   Sep  -17.2
46  1877   Oct  -16.0
51  1877   Nov  -12.6
56  1877   Dec  -12.6
2   1878   Jan   -8.7
7   1878   Feb  -21.1
12  1878   Mar  -15.5
17  1878   Apr   -8.8
22  1878   May    2.1
27  1878   Jun   -3.1
32  1878   Jul   15.9
37  1878   Aug   13.0
42  1878   Sep   17.7
47  1878   Oct   10.9
52  1878   Nov   15.1
57  1878   Dec   17.9
3   1879   Jan   12.7
8   1879   Feb   14.3
13  1879   Mar   13.2
18  1879   Apr   12.7
23  1879   May    2.1
28  1879   Jun   16.4
33  1879   Jul   21.8
38  1879   Aug   22.6
43  1879   Sep   18.9
48  1879   Oct   15.2
53  1879   Nov    9.8
58  1879   Dec   -5.5
4   1880   Jan   10.8
9   1880   Feb    7.7
14  1880   Mar   14.3
19  1880   Apr    5.3
24  1880   May   12.3
29  1880   Jun    9.1
34  1880   Jul    1.6
39  1880   Aug   14.3
44  1880   Sep    8.1
49  1880   Oct    4.8
54  1880   Nov    7.2
59  1880   Dec   -1.9