我有两个data_frame,如下所示:
df_name:
Student_ID Name DOB
0 1 Raju 1993-02-02
1 2 Indu 1987-01-04
2 3 Laya 2000-06-24
df_marks:
Student_ID Subject Int1/40 Int2/40
0 1 Eng 10 35
1 1 Tam 30 38
2 1 Mat 20 30
3 1 Sci 15 20
4 2 Eng 35 25
5 2 Tam 25 15
6 2 Mat 22 30
7 2 Sci 29 23
8 3 Eng 18 17
9 3 Tam 19 16
10 3 Mat 27 26
任务是创建一个data_frame(下一个),如果需要df_marks['Int1/40']
,我需要在其中添加df_marks['Int2/40']
和df_name['Student_ID'] == df_marks['Student_ID']
Student_id Name DOB Tam/50
0 1 Raju 1993-02-02 NaN
1 2 Indu 1987-01-04 NaN
2 3 Laya 2000-06-24 NaN
我尝试过
df_out['Tam/50'] = df_marks[['Int1/40','Int2/40']].sum(axis=1).where(df_marks['Subject']==df_out['Student_id'])
但是它给出的错误是
ValueError: Can only compare identically-labeled Series objects
我们有什么简单的方法吗?
关于, 迪帕克·达什(Deepak Dash)
答案 0 :(得分:2)
将DataFrame.join
与汇总的sum
一起用于df_name
中的新列:
df_marks['Tam/50'] = df_marks[['Int1/40','Int2/40']].sum(axis=1)
df_name = df_name.join(df_marks.groupby('Student_ID')['Tam/50'].sum(), on='Student_ID')
print (df_name)
Student_ID Name DOB Tam/50
0 1 Raju 1993-02-02 198
1 2 Indu 1987-01-04 204
2 3 Laya 2000-06-24 123
或者没有帮助列的解决方案:
s = (df_marks[['Int1/40','Int2/40']].sum(axis=1)
.groupby(df_marks['Student_ID'])
.sum()
.rename('Tam/50'))
df_name = df_name.join(s, on='Student_ID')
print (df_name)
Student_ID Name DOB Tam/50
0 1 Raju 1993-02-02 198
1 2 Indu 1987-01-04 204
2 3 Laya 2000-06-24 123
答案 1 :(得分:1)
您可以使用pd.merge
来匹配Student_ID
上的两个数据帧。然后使用groupby
和sum
:
In [574]: res = pd.merge(df_name, df_marks,on='Student_ID')
In [592]: r = res.groupby(['Student_ID', 'Name', 'DOB'])[['Int1/40','Int2/40']].sum(1).reset_index()
In [594]: r['Tam/50'] = r['Int1/40'] + r['Int2/40']
In [604]: r.drop(['Int1/40', 'Int2/40'], 1, inplace=True)
In [605]: r
Out[605]:
Student_ID Name DOB Tam/50
0 1 Raju 1993-02-02 198
1 2 Indu 1987-01-04 204
2 3 Laya 2000-06-24 123