我有以下程序:
std::string
有人知道如何找到每个学生与之互动的学生人数(通过考试)吗?
这样的事情:
df = pd.DataFrame({
'student':['a'] * 3 + ['b'] * 3 + ['c'] * 4,
'semester':[1, 1, 2, 2, 1, 1, 2, 2, 2, 2],
'passed_exam':[True, False] * 5,
'exam': [
'French', 'English', 'Italian', 'Chinese', 'Russian',
'German', 'Chinese', 'Spanish', 'English', 'French'
]
})
print (df)
passed_exam exam semester student
0 True French 1 a
1 False English 1 a
2 True Italian 2 a
3 False Chinese 2 b
4 True Russian 1 b
5 False German 1 b
6 True Chinese 2 c
7 False Spanish 2 c
8 True English 2 c
9 False French 2 c
提前谢谢!
答案 0 :(得分:1)
我解释"每个学生与之互动的学生人数(通过考试)"作为参加同一考试的学生#
然后,似乎:
df1 = (df
.groupby(["exam","semester"], as_index=False)["student"].agg("count")
.rename(columns={"student":"total_st"}))
df.merge(df1).sort_values(["semester","student"])
passed_exam exam semester student total_st
0 True French 1 a 1
1 False English 1 a 1
5 True Russian 1 b 1
6 False German 1 b 1
2 True Italian 2 a 1
3 False Chinese 2 b 2
4 True Chinese 2 c 2
7 False Spanish 2 c 1
8 True English 2 c 1
9 False French 2 c 1
答案 1 :(得分:1)
我理解你的问题的方式,你想要' total_st'列对应于学生在给定考试中与之交互的学生人数。例如,如果考试法语'有4名学生(' a'' b',' c'' d'),学生' a' ;与3名学生互动。我是对的吗?
如果是这样,这是一个解决方案。首先,让我们忘记学期来简化问题,让我们考虑下面的例子:
df = pd.DataFrame({
'student': ['a'] * 3 + ['b'] * 3 + ['c'] * 4,
'exam': [
'Chinese', 'English', 'Spanish', 'Chinese', 'Spanish',
'Spanish', 'Chinese', 'Spanish', 'English', 'Chinese'
],
'passed_exam':[True, False] * 5
})
print(df)
exam passed_exam student
0 Chinese True a
1 English False a
2 Spanish True a
3 Chinese False b
4 Spanish True b
5 Spanish False b
6 Chinese True c
7 Spanish False c
8 English True c
9 Chinese False c
现在,我们可以使用groupby计算一个关联每个考试的学生人数的系列:
d = df.groupby(['exam'])['student'].count()
print(d)
exam
Chinese 4
English 2
Spanish 4
Name: student, dtype: int64
我们通过删除每个值1来获得学生与之互动的学生人数:
d = d - 1
最后,我们创建了' total_st'列使用apply,并将其分配给初始数据帧:
total_st = df.apply(lambda x: d.loc[x['exam']], axis=1)
df = df.assign(total_st=total_st)
print(df)
exam passed_exam student total_st
0 Chinese True a 3
1 English False a 1
2 Spanish True a 3
3 Chinese False b 3
4 Spanish True b 3
5 Spanish False b 3
6 Chinese True c 3
7 Spanish False c 3
8 English True c 1
9 Chinese False c 3
答案 2 :(得分:1)
IIUC你可以这样做:
In [116]: df['total_st'] = df.groupby(['exam','semester'])['student'].transform('size')
In [117]: df
Out[117]:
passed_exam exam semester student total_st
0 True French 1 a 1
1 False English 1 a 1
2 True Italian 2 a 1
3 False Chinese 2 b 2
4 True Russian 1 b 1
5 False German 1 b 1
6 True Chinese 2 c 2
7 False Spanish 2 c 1
8 True English 2 c 1
9 False French 2 c 1