多级数据框中行之间的差异

时间:2018-10-09 03:31:39

标签: pandas

我正在通过迭代某些类中的值来查找多级数据中两行之间的差异,并尝试通过阅读教程尝试不同的技术,因为我仍然不熟悉python / pandas功能。

我要做的是找出某个班级的老师和每个学生的分数之间的差异。

数据框:

Class, Name ,Reference, stats
    X ,SHE ,student, 30
    X ,GHE ,student, 20
    X ,GMK ,student ,10
    X ,JKO ,teacher ,50
    Y ,HHH ,student ,20
    Y ,KLP ,teacher ,30

输出:

Class,teacher, student, difference
X, JKO, SHE,20
X, JKO,GHE, 30
X, JKO, GMK, 40
Y, KLP, HHH, 10

有人可以通过引导我朝正确的方向来帮助我吗?全班最多可以有1位老师。

谢谢

3 个答案:

答案 0 :(得分:0)

下面是带有许多for循环的代码。因此,应该有一个比这更好的解决方案。 (稍后,我将尝试以更好的方式更新此解决方案)

import pandas as pd
df = pd.read_csv("student.csv")
ref = df[df['Reference'] == 'teacher'].index.values.astype(int)
df['TeacherName'] = 'NA'
df['Difference'] = 0

for i in range(len(ref)):
    if(i == 0):
        for j in range(ref[i]+1):

            df['TeacherName'][j] = df['Name'][ref[i]]
            df['Difference'][j] = df['stats'][ref[i]] - df['stats'][j]
    else:
         for j in range(ref[i-1]+1, ref[i]):

             df['TeacherName'][j] = df['Name'][ref[i]]
             df['Difference'][j] = df['stats'][ref[i]] - df['stats'][j]

df[~ df.index.isin(ref)]

我将每次发生df['Reference'] == 'teacher'的行索引放到名为ref的列表中,该列表将从循环语句后的df中删除。

答案 1 :(得分:0)

使用:

print (df)
  Class Name Reference  stats
0     X  SHE   student     30
1     X  GHE   student     20
2     X  GMK   student     10
3     X  JKO   teacher     50
4     X  ABC   teacher    100 <-added one new row for general data
5     Y  HHH   student     20
6     Y  KLP   teacher     30

df = (df.query("Reference == 'teacher'")
          .merge(df.query("Reference == 'student'"), on='Class', suffixes=('_t','_s'))
          .assign(difference=lambda x: x['stats_t'] - x['stats_s'])
          .drop(['Reference_s','Reference_t','stats_s','stats_t'], axis=1)
          .rename(columns={'Name_s':'student','Name_t':'teacher'})
          )
print (df)
  Class teacher student  difference
0     X     JKO     SHE          20
1     X     JKO     GHE          30
2     X     JKO     GMK          40
3     X     ABC     SHE          70
4     X     ABC     GHE          80
5     X     ABC     GMK          90
6     Y     KLP     HHH          10

说明

  1. DataFramestudent行的query过滤teacher
  2. 然后按Classmerge逐组查看所有组合
  3. 然后用减法assign创建新列
  4. 通过drop删除不必要的列
  5. 最后rename

答案 2 :(得分:0)

只需将数据集分为两个数据框,一个用于学生,一个用于老师。然后合并。

students = df[df.Reference == 'student'][['Class','Name','stats']]
teachers = df[df.Reference == 'teacher'][['Class','Name','stats']]

new_df = students.merge(teachers, on='Class', suffixes=('_student','_teacher'))
new_df['difference'] = new_df.stats_teacher - new_df.stats_student

print(new_df)
  Class Name_student  stats_student Name_teacher  stats_teacher  difference
0     X          SHE             30          JKO             50          20
1     X          GHE             20          JKO             50          30
2     X          GMK             10          JKO             50          40
3     Y          HHH             20          KLP             30          10