如何获得详细的"在熊猫中描述()

时间:2017-09-08 06:08:11

标签: pandas pandas-groupby

所以,我有一个问题,我将在这里复制两列:

Range       Answer
>30          maybe
>30          yes
<30          no
<30          yes
>30          maybe
<30          yes

所以我需要做的是按范围进行分组,并知道每个选项的答案数量,在这种情况下:

Range       Answer
<30          
             no: 1
             yes:2
             maybe:0
>30          
             no: 0
             yes:1
             maybe:2

实际上,没有2列但其中有很多列我需要按其中一个进行分组,然后在数据帧中为每个其他列获取该类统计信息。这是我第一次使用分类数据,而且我很丢失。我使用describe()并且它适用于最常见的答案,但我需要每个答案,有一个直接的方法,如&#34;详细的desceibe()&#34;?

2 个答案:

答案 0 :(得分:0)

使用crosstab

的一种方法
In [685]: pd.crosstab(df.Range, df.Answer).stack()
Out[685]:
Range  Answer
<30    maybe     0
       no        1
       yes       2
>30    maybe     2
       no        0
       yes       1
dtype: int64

或者,groupby

In [690]: df.groupby(['Range', 'Answer']).size().unstack(fill_value=0).stack()
Out[690]:
Range  Answer
<30    maybe     0
       no        1
       yes       2
>30    maybe     2
       no        0
       yes       1
dtype: int64

答案 1 :(得分:0)

您可以使用melt重新整合汇总size

print (df)
  Range Answer1 Answer2 Answer3
0   >30   maybe      no     yes
1   >30     yes     yes      no
2   <30      no     yes      no
3   <30     yes   maybe      no
4   >30   maybe      no     yes
5   <30     yes      no      no
print (df.melt('Range', var_name='Answers', value_name='Vals'))
   Range  Answers   Vals
0    >30  Answer1  maybe
1    >30  Answer1    yes
2    <30  Answer1     no
3    <30  Answer1    yes
4    >30  Answer1  maybe
5    <30  Answer1    yes
6    >30  Answer2     no
7    >30  Answer2    yes
8    <30  Answer2    yes
9    <30  Answer2  maybe
10   >30  Answer2     no
11   <30  Answer2     no
12   >30  Answer3    yes
13   >30  Answer3     no
14   <30  Answer3     no
15   <30  Answer3     no
16   >30  Answer3    yes
17   <30  Answer3     no
df1 = df.melt('Range', var_name='Answers', value_name='Vals') \
        .groupby(['Range', 'Answers', 'Vals']).size()
print (df1)
Range  Answers  Vals 
<30    Answer1  no       1
                yes      2
       Answer2  maybe    1
                no       1
                yes      1
       Answer3  no       3
>30    Answer1  maybe    2
                yes      1
       Answer2  no       2
                yes      1
       Answer3  no       1
                yes      2
dtype: int64

另一种解决方案是使用stack进行重塑并使用value_counts

df1 = df.set_index('Range').stack() \
        .groupby(level=[0,1]).value_counts()
print (df1)
Range                
<30    Answer1  yes      2
                no       1
       Answer2  maybe    1
                no       1
                yes      1
       Answer3  no       3
>30    Answer1  maybe    2
                yes      1
       Answer2  no       2
                yes      1
       Answer3  yes      2
                no       1
dtype: int64