熊猫-使用前一列的汇总创建一个新列

时间:2019-05-09 00:18:23

标签: python pandas

我有一个包含2列的数据框:

CLASS   STUDENT
'Sci'   'Francy'
'math'  'Alex'
'math'  'Arthur'
'math'  'Katy'
'eng'   'Jack'
'eng'   'Paul'
'eng'   'Francy'

我想为“数学”课程中的所有学生添加一个新列

CLASS   STUDENT  NEW_COL
'Sci'   'Francy'   NaN
'math'  'Alex'    'Alex', 'Arthur, Katy'
'math'  'Arthur'  'Alex', 'Arthur, Katy'
'math'  'Katy'    'Alex', 'Arthur, Katy'
'eng'   'Jack'     NaN
'eng'   'Paul'     NaN
'eng'   'Francy'   NaN

我一直在尝试这样的事情,但是我走的并不远:

def get_all_students(class_series, df):
    return df.groupby(['CLASS','STUDENT']).size().rest_index()['CLASS'== measurement].tolist()
    ...

df['NEW_COL'] = np.where(df['CLASS']=='math', get_all_students(df['CLASS'],df),np.NaN)

4 个答案:

答案 0 :(得分:2)

IIUC使用条件groupby + transform

df.loc[df.CLASS=='math','New']=df.groupby('CLASS').STUDENT.transform(','.join)
df
Out[290]: 
  CLASS STUDENT               New
0   Sci  Francy               NaN
1  math    Alex  Alex,Arthur,Katy
2  math  Arthur  Alex,Arthur,Katy
3  math    Katy  Alex,Arthur,Katy
4   eng    Jack               NaN
5   eng    Paul               NaN
6   eng  Francy               NaN

更多信息,因为我通过groupby计算了所有组,所以您可以全部分配它们,也可以只选择需要的条件分配

df.groupby('CLASS').STUDENT.transform(','.join)
Out[291]: 
0              Francy
1    Alex,Arthur,Katy
2    Alex,Arthur,Katy
3    Alex,Arthur,Katy
4    Jack,Paul,Francy
5    Jack,Paul,Francy
6    Jack,Paul,Francy
Name: STUDENT, dtype: object

答案 1 :(得分:1)

您可以只使用str.join

df.loc[df['CLASS'] == 'math', 'new_col'] = ', '.join(df.loc[df['CLASS'] == 'math', 'STUDENT'])

答案 2 :(得分:1)

您可以这样做:

df = pd.DataFrame({"CLASS":['sci','math','math','math','eng','eng','eng'],"STUDENT":['Francy','Alex','Arthur','Katy','Jack','Pauk','Francy']})

第1步:定义您的功能

def get_student_list(class_name): 
    students = list(df[df['CLASS']==class_name]['STUDENT'])
    return ", ".join(students)

第2步:在函数中使用numpy:

requested_class = 'math'
df['NEW_COL']=np.where(df['CLASS']==requested_class,get_student_list(requested_class),np.NaN)

所需结果:

enter image description here

答案 3 :(得分:1)

使用pivot_tablemap的另一种方法:

df['NEW_COL'] = df.CLASS.map(pd.pivot_table(df, 'STUDENT', 'CLASS', 'CLASS', aggfunc=','.join)['math']).fillna(np.nan)

Out[331]:
  CLASS STUDENT           NEW_COL
0   Sci  Francy               NaN
1  math    Alex  Alex,Arthur,Katy
2  math  Arthur  Alex,Arthur,Katy
3  math    Katy  Alex,Arthur,Katy
4   eng    Jack               NaN
5   eng    Paul               NaN
6   eng  Francy               NaN