我正在通过迭代某些类中的值来查找多级数据中两行之间的差异,并尝试通过阅读教程尝试不同的技术,因为我仍然不熟悉python / pandas功能。
我要做的是找出某个班级的老师和每个学生的分数之间的差异。
数据框:
Class, Name ,Reference, stats
X ,SHE ,student, 30
X ,GHE ,student, 20
X ,GMK ,student ,10
X ,JKO ,teacher ,50
Y ,HHH ,student ,20
Y ,KLP ,teacher ,30
输出:
Class,teacher, student, difference
X, JKO, SHE,20
X, JKO,GHE, 30
X, JKO, GMK, 40
Y, KLP, HHH, 10
有人可以通过引导我朝正确的方向来帮助我吗?全班最多可以有1位老师。
谢谢
答案 0 :(得分:0)
下面是带有许多for循环的代码。因此,应该有一个比这更好的解决方案。 (稍后,我将尝试以更好的方式更新此解决方案)
import pandas as pd
df = pd.read_csv("student.csv")
ref = df[df['Reference'] == 'teacher'].index.values.astype(int)
df['TeacherName'] = 'NA'
df['Difference'] = 0
for i in range(len(ref)):
if(i == 0):
for j in range(ref[i]+1):
df['TeacherName'][j] = df['Name'][ref[i]]
df['Difference'][j] = df['stats'][ref[i]] - df['stats'][j]
else:
for j in range(ref[i-1]+1, ref[i]):
df['TeacherName'][j] = df['Name'][ref[i]]
df['Difference'][j] = df['stats'][ref[i]] - df['stats'][j]
df[~ df.index.isin(ref)]
我将每次发生df['Reference'] == 'teacher'
的行索引放到名为ref
的列表中,该列表将从循环语句后的df
中删除。
答案 1 :(得分:0)
使用:
print (df)
Class Name Reference stats
0 X SHE student 30
1 X GHE student 20
2 X GMK student 10
3 X JKO teacher 50
4 X ABC teacher 100 <-added one new row for general data
5 Y HHH student 20
6 Y KLP teacher 30
df = (df.query("Reference == 'teacher'")
.merge(df.query("Reference == 'student'"), on='Class', suffixes=('_t','_s'))
.assign(difference=lambda x: x['stats_t'] - x['stats_s'])
.drop(['Reference_s','Reference_t','stats_s','stats_t'], axis=1)
.rename(columns={'Name_s':'student','Name_t':'teacher'})
)
print (df)
Class teacher student difference
0 X JKO SHE 20
1 X JKO GHE 30
2 X JKO GMK 40
3 X ABC SHE 70
4 X ABC GHE 80
5 X ABC GMK 90
6 Y KLP HHH 10
说明:
答案 2 :(得分:0)
只需将数据集分为两个数据框,一个用于学生,一个用于老师。然后合并。
students = df[df.Reference == 'student'][['Class','Name','stats']]
teachers = df[df.Reference == 'teacher'][['Class','Name','stats']]
new_df = students.merge(teachers, on='Class', suffixes=('_student','_teacher'))
new_df['difference'] = new_df.stats_teacher - new_df.stats_student
print(new_df)
Class Name_student stats_student Name_teacher stats_teacher difference
0 X SHE 30 JKO 50 20
1 X GHE 20 JKO 50 30
2 X GMK 10 JKO 50 40
3 Y HHH 20 KLP 30 10