pandas:将行数据重新整形并分组到列数据中

时间:2016-05-20 01:03:23

标签: python pandas

我有一个曾经是数据库格式的数据框(不是我的选择),这可以通过本示例中对行而不是列的关注来证明。

 df = pd.DataFrame([['John','Sept',1,'Dec',2],['Jane','Sept',1,'Dec',3],['James','Sept',2,'Dec',2]],columns=['Name','Test 1','Score 1','Test 2','Score 2'])

   Name Test 1  Score 1 Test 2  Score 2
0   John   Sept        1    Dec        2
1   Jane   Sept        1    Dec        3
2  James   Sept        2    Dec        2

我想将其转换为此格式。

    Name  Test  Date  Score
0    Joe     1  Sept      1
1    Joe     2   Dec      2
3   Jane     1  Sept      1
4   Jane     2   Dec      3
6  James     1  Sept      2
7  James     2   Dec      2

所以基本上我想合并测试列,以便它们在Name列上分组。到目前为止,我已经看过melt()和unstack(),这让我得到了我想要的东西:

melt = pd.melt(df,id_vars=['Name','1st Test'])

    Name Test 1 variable value
0   John   Sept  Score 1     1
1   Jane   Sept  Score 1     1
2  James   Sept  Score 1     2
3   John   Sept   Test 2   Dec
4   Jane   Sept   Test 2   Dec
5  James   Sept   Test 2   Dec
6   John   Sept  Score 2     2
7   Jane   Sept  Score 2     3
8  James   Sept  Score 2     2

我非常确定groupby,melt或unstack会让我在那里,但我无法正确理解语法。建议将不胜感激。

背景:我认为(我希望)这种新格式可以让我绘制得分与测试时间的变化。

2 个答案:

答案 0 :(得分:2)

您可以将lreshapesort_values

一起使用
df['T1'] = 1
df['T2'] = 2

df = (pd.lreshape(df, {'Test': ['T1', 'T2'],
                       'Date': ['Test 1', 'Test 2'], 
                       'Score': ['Score 1', 'Score 2']}))

#reorder columns, sort dataframe by Name
df = df[['Name','Test','Date','Score']].sort_values('Name', ascending=False)
print (df)

    Name  Test  Date  Score
0   John     1  Sept      1
3   John     2   Dec      2
1   Jane     1  Sept      1
4   Jane     2   Dec      3
2  James     1  Sept      2
5  James     2   Dec      2

pd.lreshape没有很好的文档记录,但您可以使用:

In [95]: help (pd.lreshape)

In [96]: Help on function lreshape in module pandas.core.reshape:

lreshape(data, groups, dropna=True, label=None)
    Reshape long-format data to wide. Generalized inverse of DataFrame.pivot

    Parameters
    ----------
    data : DataFrame
    groups : dict
        {new_name : list_of_columns}
    dropna : boolean, default True

    Examples
    --------
    >>> import pandas as pd
    >>> data = pd.DataFrame({'hr1': [514, 573], 'hr2': [545, 526],
    ...                      'team': ['Red Sox', 'Yankees'],
    ...                      'year1': [2007, 2008], 'year2': [2008, 2008]})
    >>> data
       hr1  hr2     team  year1  year2
    0  514  545  Red Sox   2007   2008
    1  573  526  Yankees   2007   2008

    >>> pd.lreshape(data, {'year': ['year1', 'year2'], 'hr': ['hr1', 'hr2']})
          team   hr  year
    0  Red Sox  514  2007
    1  Yankees  573  2007
    2  Red Sox  545  2008
    3  Yankees  526  2008

    Returns
    -------
    reshaped : DataFrame

答案 1 :(得分:0)

可能有一些方法可以使用这些功能,但你可以在没有它们的情况下将它分成两个数据帧,然后用append()堆叠它们。

df = pd.DataFrame([['John','Sept',1,'Dec',2],['Jane','Sept',1,'Dec',3],['James','Sept',2,'Dec',2]],columns=['Name','Test 1','Score 1','Test 2','Score 2'])

# split off frame 1
df1 = df.loc[:,['Name','Test 1','Score 1']]
df1.columns = ['Name','Date','Score']
df1['Test'] = 1
df1
Out[4]:
Name    Date    Score   Test
John    Sept    1       1
Jane    Sept    1       1
James   Sept    2       1

# split off frame 2
df2 = df.loc[:,['Name','Test 2','Score 2
df2 = df.loc[:,['Name','Test 2','Score 2']]
df2.columns = ['Name','Date','Score']
df2['Test'] = 2
df2
Out[5]:
Name    Date    Score   Test
John    Dec     2       2
Jane    Dec     3       2 
James   Dec     2       2

# combine the two frames
df = df1.append(df2)
df.sort_values('N
df = df1.append(df2)
df.sort_values('Name')
Out[6]:
Name    Date    Score   Test
James   Sept    2       1
James   Dec     2       2
Jane    Sept    1       1
Jane    Dec     3       2
John    Sept    1       1
John    Dec     2       2