将时间序列数据与pandas中的元数据相结合的正确方法是什么?

时间:2017-11-17 12:21:17

标签: python pandas csv data-analysis

我有两个csv文件:

customer.csv

id  name     birthday
1   Martin   28.04.1990
2   Twain    30.11.1835
....

purchases.csv

purchase_id    customer_id    item                            price
1              1              About the ugly German language  3.14
2              1              Food                            15.92
3              1              Book                            65.35
4              2              Stone                           89.79

我可以将两个数据帧加载为

df_customers = pd.read_csv('customers.csv')
df_purchases = pd.read_csv('purchases.csv')

但我如何将这两者结合起来,以便我可以轻松回答以下问题:

  • 每个客户购买了多少件商品?
  • 每位客户的商品平均价格是多少?

1 个答案:

答案 0 :(得分:2)

mergeright加入:

一起使用
df = pd.merge(df_customers, df_purchases, left_on='id', right_on='customer_id', how='right')
print (df)
   purchase_id  customer_id                            item  price
0            1            1  About the ugly German language   3.14
1            2            1                            Food  15.92
2            3            1                            Book  65.35
3            4            2                           Stone  89.79
   id    name    birthday  purchase_id  customer_id  \
0   1  Martin  28.04.1990            1            1   
1   1  Martin  28.04.1990            2            1   
2   1  Martin  28.04.1990            3            1   
3   2   Twain  30.11.1835            4            2   

                             item  price  
0  About the ugly German language   3.14  
1                            Food  15.92  
2                            Book  65.35  
3                           Stone  89.79