如何使用apply而不是使用循环来获取以多列作为输入的函数

时间:2017-08-25 07:51:21

标签: python pandas apply

我有一个数据框clg_df,如下所示:

prov      wave    clg_id      bar
  11       2005     9         500

我有一个函数,它将四列作为输入,并在另一个数据帧test_df上运行。

prov      wave   clg_id   st_id   score
  11       2005     10      111     560

我想找到每个prov波的学生人数,其得分高于prov-wave-clgid组合所定义的栏。

最终结果应如下所示:

 prov      wave    clg_id      bar   number
  11       2005     9         500      40

我正在使用循环来实现所需的输出。是否可以使用apply函数?

def gen_envy(clg,prov,year,test_df,clg_bar_df):

# select subframe from the clg_bar_df for a given clg prov year combination
condition_1 = clg_bar_df['provid'] == prov
condition_2 = clg_bar_df['wave']   == year
condition_3 = clg_bar_df['clg_id'] == clg

# select the bar associated with the clg prov year 
temp = clg_bar_df.loc[condition_1 & condition_2 & condition_3]
#print(temp)
bar = temp['bar'].values[0]
#print(bar)

# select a temp2 df from the gaokao_bar_df for a given prov year combination
condition_4= gaokao_bar_df['provid']      == prov
condition_5= gaokao_bar_df['wave']        == year

temp2 = gaokao_bar_df.loc[condition_4 & condition_5]

# within the temp2 df, generate a new column with 1 as the score larger than 
the cutoff, 0 smaller than the cutoff
# two conditions need to be satisfied:
# 1. Own score higher than the bar 
# 2. Enrolled to a school with cut off lower than the bar 

condition_6= temp2['score'] > bar
condition_7= temp2['bar'] < bar
x = condition_6 & condition_7
#print(x)

# return the fraction of envy
return x.mean()    

我使用循环来调用函数:

for i in range(len(clg_bar_df)):
clg  = clg_bar_df['clg_id'].iloc[i]
prov = clg_bar_df['provid'].iloc[i]
year = clg_bar_df['wave'].iloc[i]
clg_bar_df['envy'].iloc[i] = gen_envy(clg,prov,year,gaokao_bar_df,clg_bar_df)

0 个答案:

没有答案