如何分组并依靠不同的条件?

时间:2016-04-28 03:16:36

标签: pandas

is_correct, question_id
t           1
t           1
f           1
f           1
t           2
t           2

期望的结果:

correct_count, incorrect_count, question_id
2              2                1
2              0                2

这就是我所拥有的,但我只能获得正确的计数

df[df["is_correct"]].groupby("question_id")["question_id"].count()

2 个答案:

答案 0 :(得分:1)

您可以使用pivot_table函数:

In [28]: data = """\
   ....: is_correct  question_id
   ....: t           1
   ....: t           1
   ....: f           1
   ....: f           1
   ....: t           2
   ....: t           2
   ....: """

In [29]: df = pd.read_csv(io.StringIO(data), delim_whitespace=True)

In [30]: df['count'] = 0

In [31]:

In [31]: df
Out[31]:
  is_correct  question_id  count
0          t            1      0
1          t            1      0
2          f            1      0
3          f            1      0
4          t            2      0
5          t            2      0

In [32]:

In [32]: df.pivot_table(index='question_id', columns='is_correct',
   ....:                values='count', aggfunc='count', fill_value=0)\
   ....:   .reset_index()
Out[32]:
is_correct  question_id  f  t
0                     1  2  2
1                     2  0  2

答案 1 :(得分:0)

您可以在创建另一个用于计数的列后使用groupby:

df = pd.DataFrame({'is_correct':['t','t','f','f','t','t'],'question_id':[1,1,1,1,2,2]})
df['to_sum_up']=1

is_correct question_id   to_sum_up
t           1            1
t           1            1
f           1            1
f           1            1
t           2            1
t           2            1

df2 = df.groupby(['question_id','is_correct'],as_index = False).sum()

完成群组后,您只需重新排列数据,使其符合您想要的列:

df2['correct_count'] = df2.ix[df2['is_correct']=='t','N']
df2['incorrect_count'] = df2.ix[df2['is_correct']=='f','N']

然后为了得到一个好的数据帧作为输出:

df2.ix[df2['correct_count'].isnull(),'correct_count'] = 0
df2.ix[df2['incorrect_count'].isnull(),'incorrect_count'] = 0
df2 = df2.groupby('question_id',as_index = False).max()
df2 = df2.drop(['N','is_correct'],1)

      question_id   correct_count   incorrect_count
0     1             2               2
1     2             2               0