我利用熊猫做一些分析练习。我想创建一个新列,其值是两行的总和。原始数据集如下......
Admit Gender Dept Freq
0 Admitted Male A 512
1 Rejected Male A 313
2 Admitted Female A 89
3 Rejected Female A 19
4 Admitted Male B 353
5 Rejected Male B 207
6 Admitted Female B 17
7 Rejected Female B 8
8 Admitted Male C 120
9 Rejected Male C 205
10 Admitted Female C 202
11 Rejected Female C 391
12 Admitted Male D 138
13 Rejected Male D 279
14 Admitted Female D 131
15 Rejected Female D 244
16 Admitted Male E 53
17 Rejected Male E 138
18 Admitted Female E 94
19 Rejected Female E 299
20 Admitted Male F 22
21 Rejected Male F 351
22 Admitted Female F 24
23 Rejected Female F 317
我想利用以下数据框创建一个新列......
Dept Gender Freq
0 A Female 108
1 A Male 825
2 B Female 25
3 B Male 560
4 C Female 593
5 C Male 325
6 D Female 375
7 D Male 417
8 E Female 393
9 E Male 191
10 F Female 341
11 F Male 373
我想利用第二个数据帧的Freq
列在第一个数据帧中创建一个新列。我需要在两个数据框中插入108
值if Detp and Gender
相同。新数据框应如下所示......
Admit Gender Dept Freq Total
0 Admitted Male A 512 825
1 Rejected Male A 313 825
2 Admitted Female A 89 108
3 Rejected Female A 19 108
4 Admitted Male B 353 560
5 Rejected Male B 207 560
6 Admitted Female B 17 25
7 Rejected Female B 8 25
我试过以下代码......
for i in data.iterrows():
for j in total_freq.iterrows():
if i[1].Gender == total_freq.Gender & i[1].Dept == total_freq.Dept:
data['Total'] = total_freq.Freq
我收到以下错误... TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]
有关使用正确值创建列的任何帮助吗?
答案 0 :(得分:2)
您可以使用转换
df['Total'] = df.groupby(['Dept', 'Gender']).Freq.transform('sum')
你得到了
Admit Gender Dept Freq Total
0 Admitted Male A 512 825
1 Rejected Male A 313 825
2 Admitted Female A 89 108
3 Rejected Female A 19 108
4 Admitted Male B 353 560
5 Rejected Male B 207 560
6 Admitted Female B 17 25
7 Rejected Female B 8 25
8 Admitted Male C 120 325
9 Rejected Male C 205 325
10 Admitted Female C 202 593
11 Rejected Female C 391 593
12 Admitted Male D 138 417
13 Rejected Male D 279 417
14 Admitted Female D 131 375
15 Rejected Female D 244 375
16 Admitted Male E 53 191
17 Rejected Male E 138 191
18 Admitted Female E 94 393
19 Rejected Female E 299 393
20 Admitted Male F 22 373
21 Rejected Male F 351 373
22 Admitted Female F 24 341
23 Rejected Female F 317 341
答案 1 :(得分:0)
您可以使用pandas.DataFrame.merge()将第二个数据框中的总计加入第一个数据框。首先,在总计df。
中重命名freqdf1 = df1.rename(columns={'Freq':'Total'})
df_totals = pd.merge(df, df1['Total'], how='left', on=['Gender', 'Dept'])