在大号上连接两个数据帧。列

时间:2016-06-08 05:10:01

标签: python pandas dataframe concatenation

我必须使用concatenate函数for large no。列。让我说这个功能。

pd.concat([mdf1[['user','tag1','tag2','tag3','tag4']].groupby(['user']).agg(sum)

这里我有大号没有。标签,所以我希望我的功能采取所有列说'tag1'之后我怎么能这样做? MDF1

        user        page_name            category  tag1  tag2  tag3
0  random guy        BlackBuck   Transport/Freight     1     1     0
1   mank nion        DJ CHETAS  Arts/Entertainment     0     1     1
2  random guy      GiveMeSport               Sport     1     0     1
3   mank nion  Gurkeerat Singh      Actor/Director     1     0     1

MDF2

          user         page_name            category  tag1  tag2  tag3
0   pop rajuel      WOW Editions        Concert Tour   NaN   NaN   NaN
1  Roshan ghai            MensXP  News/Media Website   NaN   NaN   NaN
2    mank nion     Celina Jaitly             Actress   NaN   NaN   NaN
3   pop rajuel      500 Startups            App Page   1.0   0.0   1.0
4  Roshan ghai          No Abuse           Community   NaN   NaN   NaN
5   random guy  Analytics Ninja    Insurance Company   NaN   NaN   NaN
6   pop rajuel  Biswapati Sarkar      Actor/Director   1.0   0.0   0.0
7  Roshan ghai     the smartian        Public Figure   0.0   1.0   1.0

输出

      user  tag1  tag2  tag3
0    mank nion   1.0   1.0   2.0
1   random guy   2.0   1.0   1.0
2  Roshan ghai   0.0   1.0   1.0
3    mank nion   NaN   NaN   NaN
4   pop rajuel   2.0   0.0   1.0
5   random guy   NaN   NaN   NaN

唯一不同的地方我想申请的是我有一个很大的没有。列,即'tag4''tag5'。所以我希望我的代码在此代码中的'tag1'之后取出所有列我在将2 mdf分组后在用户上并将它们相加后基本连接。

1 个答案:

答案 0 :(得分:0)

我认为您需要concat groupby并汇总sum

df = pd.concat([mdf1,mdf2])
print (df)
          user         page_name            category  tag1  tag2  tag3
0   random guy         BlackBuck   Transport/Freight   1.0   1.0   0.0
1    mank nion         DJ CHETAS  Arts/Entertainment   0.0   1.0   1.0
2   random guy       GiveMeSport               Sport   1.0   0.0   1.0
3    mank nion   Gurkeerat Singh      Actor/Director   1.0   0.0   1.0
0   pop rajuel      WOW Editions        Concert Tour   NaN   NaN   NaN
1  Roshan ghai            MensXP  News/Media Website   NaN   NaN   NaN
2    mank nion     Celina Jaitly             Actress   NaN   NaN   NaN
3   pop rajuel      500 Startups            App Page   1.0   0.0   1.0
4  Roshan ghai          No Abuse           Community   NaN   NaN   NaN
5   random guy   Analytics Ninja   Insurance Company   NaN   NaN   NaN
6   pop rajuel  Biswapati Sarkar      Actor/Director   1.0   0.0   0.0
7  Roshan ghai      the smartian       Public Figure   0.0   1.0   1.0

print (df.groupby('user', as_index=False).sum())
          user  tag1  tag2  tag3
0  Roshan ghai   0.0   1.0   1.0
1    mank nion   1.0   1.0   2.0
2   pop rajuel   2.0   0.0   1.0
3   random guy   2.0   1.0   1.0

page_namecategory列被省略,因为automatic exclusion of nuisance columns