使用pandas和python在python中分组数据

时间:2016-07-22 14:22:01

标签: python numpy pandas pivot-table data-analysis

我使用python,pandas,numpy。

package my.vaadin.project.vaadinUploader;



import com.vaadin.annotations.JavaScript;
import com.vaadin.ui.CustomComponent;
import com.vaadin.ui.TextField;
import com.vaadin.ui.VerticalLayout;

@JavaScript({ "https://ajax.googleapis.com/ajax/libs/jquery/1.12.2/jquery.min.js", "vaadin://js/script.js" })

public class UploaderComponent extends CustomComponent
{
    final TextField name;
    final TextField surname;
    //final Label div;
    final VerticalLayout formLayout = new VerticalLayout();

    public UploaderComponent(){

        formLayout.addStyleName("myLayout");
        //div = new Label();


        name = new TextField();
        surname = new TextField();
        name.setCaption("Type your name here:");
        surname.setCaption("Type your surname here:");
        formLayout.addComponents(name, surname);

    }


}

我有DataFrame:

df = pd.read_csv('data.csv')
print df.head(7)

我需要:

name  day  sum
A      D1    6 
B      D1    7 
B      D3    8 
A      D10   3 
A      D2    4 
C      D2    6 
A      D1    9

我需要获得具有累计总数的下表:

name   D1    D2      D3     ... D10
A      =6+9  =6+9+4  =6+9+4    =6+9+4+...+3
B      =7    =7      =7+8      =7+8+...+ 
C      =0    =0+6    =0+6        =6+...

请告诉我怎么做? 谢谢!

P.S。我使用函数 pivot_table ,(但结果不是累计总计):

name   D1    D2     D3    ... D10
A      15    19     19       ....
B      7     7      15      
C      0     6      6        

2 个答案:

答案 0 :(得分:1)

pivot使用sum,然后是fillna,实际上 正是您在问题中指定的内容:

In [18]: df
Out[18]: 
  name  day  sum
0    A   D1    6
1    B   D1    7
2    B   D3    8
3    A  D10    3
4    A   D2    4
5    C   D2    6
6    A   D1    9

In [19]: pd.pivot_table(df, values='sum', index=['name'], columns=    ['day'], aggfunc=sum).fillna(0)
Out[19]: 
day     D1  D10   D2   D3
name                     
A     15.0  3.0  4.0  0.0
B      7.0  0.0  0.0  8.0
C      0.0  0.0  6.0  0.0

例如, 15.0 = 6 + 9 ,正如您指定的那样。

答案 1 :(得分:1)

使用df.cumsum(axis = 1)

pivotedDf = pd.pivot_table(df, values='sum', index=['name'], columns=['day'], aggfunc=np.sum)
pivotedDf = pivotedDf[['D1', 'D2', 'D3', 'D10']]  # manually sort columns
pivotedDf.cumsum(axis=1)