使用python从index(first)列中查找公用名,并将其与同一行的后续列相加。
df1
Name sub1 sub2 sub3
X 1 2 5
Y 4 5 6
df2
Name sub1 sub2 sub3
A 3 5 3
Y 3 1 4
输出应该只显示第一列中的Y作为公共列,并将列内容显示为df2,但是在sub3列中,它应该从df1和df2进行平均。
output
Name sub1 sub2 sub3
Y 3(df2) 1(df2) 5=(df1+df2)/2
答案 0 :(得分:1)
我认为这会对你有所帮助: -
import pandas as pd
df1= pd.DataFrame([['X', 1, 2, 5],['Y', 4, 5, 6]], columns = ["Name", "sub1", "sub2","sub3"])
df2= pd.DataFrame([['A', 3, 5, 3],['Y', 3,1, 4]], columns = ["Name", "sub1", "sub2","sub3"])
joindeDf = df1.append(df2).groupby("Name").agg({"sub3":"mean", "Name":"count"}).query("Name > 1")
joindeDf.drop("Name", axis = 1, inplace = True)
df2.drop("sub3", axis = 1, inplace = True)
df2.index = df1.Name
opDF = df2.merge(joindeDf, left_index=True, right_index=True, how = 'inner')
print opDF
输出: -
Name sub1 sub2 sub3
Name
Y Y 3 1 5
答案 1 :(得分:1)
Pandas与on ='Name'合并将只为您提供具有通用名称的行。然后你可以删除不必要的列并找到这样的sub3的平均值。
df_result = pd.merge(df2, df1, on = 'Name')
df_result['sub3'] = df_result[['sub3_x', 'sub3_y']].mean(axis = 1)
df_result = df_result.drop(['sub3_x','sub1_y','sub2_y','sub3_y'], axis = 1)
df_result.columns = ['Name', 'sub1', 'sub2', 'sub3']
产生的数据框
Name sub1 sub2 sub3
0 Y 3 1 5