我有 2 个数据框 df_1
和 df_2
:
df_1:
id number array_col
001 0 [0.084, 0.089, 0.047 ...]
002 0 [0.052, 0.036, 0.062 ...]
003 0 [0.087, 0.087, 0.051 ...]
. .
. .
100 0 [0.098, 0.089, 0.067 ...]
100 x 3
df_2:
id number array_col
001 1 [0.012, 0.023, 0.034 ...]
001 2 [0.045, 0.056, 0.067 ...]
002 1 [0.078, 0.089, 0.091 ...]
002 2 [0.021, 0.032, 0.043 ...]
. .
. .
100 2 [0.054, 0.065, 0.076 ...]
200 x 3
我的目标是为每个唯一的 array_col
更新 df_2
中的 id
,方法是将它们除以相同唯一 {{array_col
的 df_1
1}}。
我已经尝试了以下方法,但它似乎没有工作/更新 id
中的列。
df_2
希望得到任何帮助。如果您需要任何其他信息,请告诉我。
答案 0 :(得分:1)
您可以apply()
一个numpy.divide()
对应的array_col
列表:
df_1 = df_1.set_index('id')
df_2.array_col = df_2.apply(lambda row:
np.divide(row.array_col, df_1.loc[row.id, 'array_col']),
axis=1)
# id number array_col
# 0 001 1 [0.14285714285714285, 0.25842696629213485, 0.7...
# 1 001 2 [0.5357142857142857, 0.6292134831460675, 1.425...
# 2 002 1 [1.5, 2.4722222222222223, 1.467741935483871]
# 3 002 2 [0.4038461538461539, 0.888888888888889, 0.6935...
# 4 100 2 [0.5510204081632653, 0.7303370786516854, 1.134...
参考样本数据:
df_1 = pd.DataFrame({'id':['001','002','003','100'],'number':[0,0,0,0],'array_col':[[0.084,0.089,0.047]*200,[0.052,0.036,0.062]*200,[0.087,0.087,0.051]*200,[0.098,0.089,0.067]*200]})
df_2 = pd.DataFrame({'id':['001','001','002','002','100'],'number':[1,2,1,2,2],'array_col':[[0.012,0.023,0.034]*200,[0.045,0.056,0.067]*200,[0.078,0.089,0.091]*200,[0.021,0.032,0.043]*200,[0.054,0.065,0.076]*200]})
答案 1 :(得分:0)
检查这个是否有效
gp_df1=df_1.groupby('id')['array_col'].reset_index().rename(columns={'array_col':'array_col_1'})
df_2 = df_2.merge(gp_df1, on= 'id',how='left')
df_2['new_array_col']=df_2.array_col.div(df_2.array_col_1)