我有一些看起来像这样的数据:
Class Instructor
Intro to Philosophy Jake
Algorithms Ashley/Jake
Spanish I Ashley
Vector Calculus Jake
Intro to Philosophy Jake
如何找到一个如下所示的计数或枢轴,在该计数或枢轴上,正确地将Ashley和Jake都教授课程的实例添加到计数中?一位讲师的实例很琐碎,但是同一单元中一个班级的两个或两个以上的实例会使我绊倒。
我想要得到这样的东西:
Jake Ashley
Intro to Philosophy 2 0
Algorithms 1 1
Spanish I 0 1
Vector Calculus 1 0
Total 4 2
答案 0 :(得分:3)
您可以使用.str.get_dummies
来对Instructor
字段进行拆分和二进制化。然后,您可以按Class
分组:
ret = (df['Instructor'].str.get_dummies('/')
.groupby(df['Class']).sum()
)
ret.loc['Total'] = ret.sum()
输出:
Ashley Jake
Class
Algorithms 1 1
Intro to Philosophy 0 2
Spanish I 1 0
Vector Calculus 0 1
Total 2 4
答案 1 :(得分:2)
您可以这样做:
In [1746]: df.Instructor = df.Instructor.str.split('/')
In [1747]: df = df.explode('Instructor')
In [1751]: x = df.groupby('Instructor').Class.value_counts().reset_index(level=0).pivot(columns='Instructor', values='Class').fillna(0)
In [1754]: x.loc['Total'] = x.sum()
In [1755]: x
Out[1755]:
Instructor Ashley Jake
Class
Algorithms 1.0 1.0
Intro_to_Philosophy 0.0 2.0
Spanish_I 1.0 0.0
Vector_Calculus 0.0 1.0
Total 2.0 4.0
答案 2 :(得分:1)
让我们在crosstab
之后explode
df.Instructor = df.Instructor.str.split('/')
df = df.explode('Instructor')
out = pd.crosstab(df['Class'], df['Instructor'])