合并Python Pandas中的两个数据集

时间:2017-10-31 07:59:35

标签: python pandas merge

我有以下格式的两个数据集&想要将它们合并到基于City + Age + Gender的单个数据集中。提前致谢

数据集1:

        City    Age  Gender            Source         Count
0  California  15-24  Female  Amazon Prime Video       14629
1  California  15-24  Female             Fubo TV        3840
2  California  15-24  Female                Hulu       54067
3  California  15-24  Female             Netflix       11713
4  California  15-24  Female            Sling TV       10642

Dataset2:

         City    Age  Gender           Source     Feeds
0  California  15-24  Female             Blogs    150
1  California  15-24  Female        Customsite     57
2  California  15-24  Female       Discussions     28
3  California  15-24  Female  Facebook Comment    555
4  California  15-24  Female           Google+     19

预期的结果数据集:

    City      Age   Gender            Source          Count
  California  15-24  Female  Amazon Prime Video       14629
  California  15-24  Female             Fubo TV        3840
  California  15-24  Female                Hulu       54067
  California  15-24  Female             Netflix       11713
  California  15-24  Female            Sling TV       10642
  California  15-24  Female             Blogs          150
  California  15-24  Female        Customsite           57
  California  15-24  Female       Discussions           28
  California  15-24  Female  Facebook Comment          555
  California  15-24  Female           Google+           19

注意:Feed / Count表示相同的含义。所以可以将它们中的任何一个作为合并数据集中的列名。

1 个答案:

答案 0 :(得分:1)

使用pandas.concat列与rename列对齐列 - both DataFrames中需要相同的列:

df = pd.concat([df1, df2.rename(columns={'Feeds':'Count'})], ignore_index=True)
print (df)
         City    Age  Gender              Source  Count
0  California  15-24  Female  Amazon Prime Video  14629
1  California  15-24  Female             Fubo TV   3840
2  California  15-24  Female                Hulu  54067
3  California  15-24  Female             Netflix  11713
4  California  15-24  Female            Sling TV  10642
5  California  15-24  Female               Blogs    150
6  California  15-24  Female          Customsite     57
7  California  15-24  Female         Discussions     28
8  California  15-24  Female    Facebook Comment    555
9  California  15-24  Female             Google+     19

替代DataFrame.append - 不是纯python append

df = df1.append(df2.rename(columns={'Feeds':'Count'}), ignore_index=True)
print (df)
         City    Age  Gender              Source  Count
0  California  15-24  Female  Amazon Prime Video  14629
1  California  15-24  Female             Fubo TV   3840
2  California  15-24  Female                Hulu  54067
3  California  15-24  Female             Netflix  11713
4  California  15-24  Female            Sling TV  10642
5  California  15-24  Female               Blogs    150
6  California  15-24  Female          Customsite     57
7  California  15-24  Female         Discussions     28
8  California  15-24  Female    Facebook Comment    555
9  California  15-24  Female             Google+     19