使用Pandas Melt()来转动两组数据

时间:2016-06-10 19:58:47

标签: python python-2.7 pandas

我的手机使用情况和帐单数据排列在Pandas dataframe,其中包含两个月数据的统计信息。我想转动数据,以便每个月的列成为行。

起点:

       Name  Jan Minutes Used  Feb Minutes Used Jan Bill Paid Feb Bill Paid
0  Person A                10                11           Yes            No
1  Person B                12                13            No           Yes

期望的输出:

       Name Month  Minutes Used Bill Paid
0  Person A   Jan           10        Yes
1  Person A   Feb           11         No
2  Person B   Jan           12         No
3  Person B   Feb           13        Yes

我正在尝试使用.melt()来转置数据,但Bill Paid和Minutes Used数据会被放在同一列中,它们应该分成两列。

我的代码:

import pandas as pd
df = pd.DataFrame(data=[['Person A', 10, 11, 'Yes', 'No'], ['Person B', 12, 13, 'No', 'Yes']], columns=['Name', 'Jan Minutes Used', 'Feb Minutes Used', 'Jan Bill Paid', 'Feb Bill Paid'])

melted_df = pd.melt(df.reset_index(),
                 id_vars=['Name'],
                 value_vars=['Jan Bill Paid','Feb Bill Paid', 'Jan Minutes Used', 'Feb Minutes Used'])

melted_df['variable'] = melted_df['variable'].str.replace(' Minutes Used', '').str.replace(' Bill Paid', '')
melted_df.columns = ['Name', 'Month', 'Bill Paid']

print melted_df

我的代码输出:

       Name Month Bill Paid
0  Person A   Jan       Yes
1  Person B   Jan        No
2  Person A   Feb        No
3  Person B   Feb       Yes
4  Person A   Jan        10
5  Person B   Jan        12
6  Person A   Feb        11
7  Person B   Feb        13

1 个答案:

答案 0 :(得分:4)

您可以通过构建多索引然后使用堆栈来实现此目的:

In [31]: df = df.set_index(['Name', 'Gender'])

# split column names on first space and create multi-index (expand=True)
In [33]: df.columns = df.columns.str.split(' ', n=1, expand=True)

In [34]: df
Out[34]:
                         Jan          Feb       Jan       Feb
                Minutes Used Minutes Used Bill Paid Bill Paid
Name     Gender
Person A Male             10           11       Yes        No
Person B Female           12           13        No       Yes

# stack (move from columns to index) the first (0) level of the columns
In [35]: df = df.stack(0)

In [36]: df
Out[36]:
                    Bill Paid  Minutes Used
Name     Gender
Person A Male   Feb        No            11
                Jan       Yes            10
Person B Female Feb       Yes            13
                Jan        No            12

要显示相同的输出(全部在列中):

In [37]: df.reset_index()
Out[37]:
       Name  Gender level_2 Bill Paid  Minutes Used
0  Person A    Male     Feb        No            11
1  Person A    Male     Jan       Yes            10
2  Person B  Female     Feb       Yes            13
3  Person B  Female     Jan        No            12