is_correct, question_id
t 1
t 1
f 1
f 1
t 2
t 2
期望的结果:
correct_count, incorrect_count, question_id
2 2 1
2 0 2
这就是我所拥有的,但我只能获得正确的计数
df[df["is_correct"]].groupby("question_id")["question_id"].count()
答案 0 :(得分:1)
您可以使用pivot_table函数:
In [28]: data = """\
....: is_correct question_id
....: t 1
....: t 1
....: f 1
....: f 1
....: t 2
....: t 2
....: """
In [29]: df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
In [30]: df['count'] = 0
In [31]:
In [31]: df
Out[31]:
is_correct question_id count
0 t 1 0
1 t 1 0
2 f 1 0
3 f 1 0
4 t 2 0
5 t 2 0
In [32]:
In [32]: df.pivot_table(index='question_id', columns='is_correct',
....: values='count', aggfunc='count', fill_value=0)\
....: .reset_index()
Out[32]:
is_correct question_id f t
0 1 2 2
1 2 0 2
答案 1 :(得分:0)
您可以在创建另一个用于计数的列后使用groupby:
df = pd.DataFrame({'is_correct':['t','t','f','f','t','t'],'question_id':[1,1,1,1,2,2]})
df['to_sum_up']=1
is_correct question_id to_sum_up
t 1 1
t 1 1
f 1 1
f 1 1
t 2 1
t 2 1
df2 = df.groupby(['question_id','is_correct'],as_index = False).sum()
完成群组后,您只需重新排列数据,使其符合您想要的列:
df2['correct_count'] = df2.ix[df2['is_correct']=='t','N']
df2['incorrect_count'] = df2.ix[df2['is_correct']=='f','N']
然后为了得到一个好的数据帧作为输出:
df2.ix[df2['correct_count'].isnull(),'correct_count'] = 0
df2.ix[df2['incorrect_count'].isnull(),'incorrect_count'] = 0
df2 = df2.groupby('question_id',as_index = False).max()
df2 = df2.drop(['N','is_correct'],1)
question_id correct_count incorrect_count
0 1 2 2
1 2 2 0