我想加入/合并,连接或附加这两个Pandas DataFrames吗?

时间:2014-07-12 15:38:58

标签: python pandas

如何使用join / merge / concat / append / add将这两个表粘贴在一起,使得0-14岁的人口和15-64列是并排的?

我不需要这两个DataFrame的笛卡尔产品

我试过了:

population ages = t3.merge(t4, on='Country Name', how='inner')

T3

  Country Name  Year  Population Age 0 - 14
0        Aruba  1960              43.847771
1      Andorra  1960                    NaN
2  Afghanistan  1960              43.712284
3       Angola  1960              43.759289
4      Albania  1960              41.757282

t4

  Country Name  Population Age 15 - 64
0        Aruba               53.667355
1      Andorra                     NaN
2  Afghanistan               53.834637
3       Angola               53.587101
4      Albania               52.941044

理想地

  Country Name  Population Age 15 - 64  Population Ages 0 - 14 
0        Aruba               53.667355               43.847771
1      Andorra                     NaN                     NaN
2  Afghanistan               53.834637               43.712284
3       Angola               53.587101               43.759289
4      Albania               52.941044               41.757282

测试结果:

population_ages = t3.merge(t4, on='Country Name', how='inner')

我收到的数据框是t3,t4的笛卡尔积,形状为(734832,4),而不是(13608,4)

       Country Name  Year  Population Age 0 - 14  Population Age 15 - 64
0             Aruba  1960              43.847771               53.667355
1             Aruba  1960              43.847771               53.890141
2             Aruba  1960              43.847771               54.216911
3             Aruba  1960              43.847771               54.637810
4             Aruba  1960              43.847771               55.119324
5             Aruba  1960              43.847771               55.631104
6             Aruba  1960              43.847771               56.168560
7             Aruba  1960              43.847771               56.736549
8             Aruba  1960              43.847771               57.341782
9             Aruba  1960              43.847771               57.983109
10            Aruba  1960              43.847771               58.674343
11            Aruba  1960              43.847771               59.404758
12            Aruba  1960              43.847771               60.164749

1 个答案:

答案 0 :(得分:1)

怎么样

t4['Population Age 0 - 14'] = t3['Population Age 0 - 14']

pd.concat( t4, t3['Population Age 0 - 14'], axis=1) 

完整的工作示例:

import pandas as pd
from StringIO import StringIO

d1 = '''Country Name  Year  Population Age 0 - 14
        Aruba  1960              43.847771
      Andorra  1960                    NaN
  Afghanistan  1960              43.712284
       Angola  1960              43.759289
      Albania  1960              41.757282'''

d2 = '''Country Name  Population Age 15 - 64
        Aruba               53.667355
      Andorra                     NaN
  Afghanistan               53.834637
       Angola               53.587101
      Albania               52.941044'''

t3 = pd.DataFrame.from_csv( StringIO(d1), sep='\s{2,}', index_col=None )
print '\nt3:\n',t3

t4 = pd.DataFrame.from_csv( StringIO(d2), sep='\s{2,}', index_col=None )
print '\nt4:\n',t3

print '\n--- merge ---\n'

print pd.merge( t4, t3, on='Country Name')

print pd.merge( t4, t3[ ['Country Name', 'Population Age 0 - 14'] ], on='Country Name')

print '\n--- concat ---\n'

print pd.concat( (t4, t3['Population Age 0 - 14']), axis=1)

print '\n--- [xxx] = [xxx] ---\n'

t4['Population Age 0 - 14'] = t3['Population Age 0 - 14']

print t4

结果:

  Country Name  Population Age 15 - 64  Population Age 0 - 14
0        Aruba               53.667355              43.847771
1      Andorra                     NaN                    NaN
2  Afghanistan               53.834637              43.712284
3       Angola               53.587101              43.759289
4      Albania               52.941044              41.757282