所以,我有一个问题,我将在这里复制两列:
Range Answer
>30 maybe
>30 yes
<30 no
<30 yes
>30 maybe
<30 yes
所以我需要做的是按范围进行分组,并知道每个选项的答案数量,在这种情况下:
Range Answer
<30
no: 1
yes:2
maybe:0
>30
no: 0
yes:1
maybe:2
实际上,没有2列但其中有很多列我需要按其中一个进行分组,然后在数据帧中为每个其他列获取该类统计信息。这是我第一次使用分类数据,而且我很丢失。我使用describe()并且它适用于最常见的答案,但我需要每个答案,有一个直接的方法,如&#34;详细的desceibe()&#34;?
答案 0 :(得分:0)
使用crosstab
In [685]: pd.crosstab(df.Range, df.Answer).stack()
Out[685]:
Range Answer
<30 maybe 0
no 1
yes 2
>30 maybe 2
no 0
yes 1
dtype: int64
或者,groupby
In [690]: df.groupby(['Range', 'Answer']).size().unstack(fill_value=0).stack()
Out[690]:
Range Answer
<30 maybe 0
no 1
yes 2
>30 maybe 2
no 0
yes 1
dtype: int64
答案 1 :(得分:0)
print (df)
Range Answer1 Answer2 Answer3
0 >30 maybe no yes
1 >30 yes yes no
2 <30 no yes no
3 <30 yes maybe no
4 >30 maybe no yes
5 <30 yes no no
print (df.melt('Range', var_name='Answers', value_name='Vals'))
Range Answers Vals
0 >30 Answer1 maybe
1 >30 Answer1 yes
2 <30 Answer1 no
3 <30 Answer1 yes
4 >30 Answer1 maybe
5 <30 Answer1 yes
6 >30 Answer2 no
7 >30 Answer2 yes
8 <30 Answer2 yes
9 <30 Answer2 maybe
10 >30 Answer2 no
11 <30 Answer2 no
12 >30 Answer3 yes
13 >30 Answer3 no
14 <30 Answer3 no
15 <30 Answer3 no
16 >30 Answer3 yes
17 <30 Answer3 no
df1 = df.melt('Range', var_name='Answers', value_name='Vals') \
.groupby(['Range', 'Answers', 'Vals']).size()
print (df1)
Range Answers Vals
<30 Answer1 no 1
yes 2
Answer2 maybe 1
no 1
yes 1
Answer3 no 3
>30 Answer1 maybe 2
yes 1
Answer2 no 2
yes 1
Answer3 no 1
yes 2
dtype: int64
另一种解决方案是使用stack
进行重塑并使用value_counts
:
df1 = df.set_index('Range').stack() \
.groupby(level=[0,1]).value_counts()
print (df1)
Range
<30 Answer1 yes 2
no 1
Answer2 maybe 1
no 1
yes 1
Answer3 no 3
>30 Answer1 maybe 2
yes 1
Answer2 no 2
yes 1
Answer3 yes 2
no 1
dtype: int64