从其他数据框中获取最新值

时间:2020-07-11 16:50:22

标签: python merge

我的第一个数据帧

product=pd.DataFrame({
    'Product_ID':[101,102,103,104,105,106,107,101],
    'Product_name':['Watch','Bag','Shoes','Smartphone','Books','Oil','Laptop','New Watch'],
    'Category':['Fashion','Fashion','Fashion','Electronics','Study','Grocery','Electronics','Electronics'],
    'Price':[299.0,1350.50,2999.0,14999.0,145.0,110.0,79999.0,9898.0],
    'Seller_City':['Delhi','Mumbai','Chennai','Kolkata','Delhi','Chennai','Bengalore','New York']
})

我的第二个数据框有事务

customer=pd.DataFrame({
    'id':[1,2,3,4,5,6,7,8,9],
    'name':['Olivia','Aditya','Cory','Isabell','Dominic','Tyler','Samuel','Daniel','Jeremy'],
    'age':[20,25,15,10,30,65,35,18,23],
    'Product_ID':[101,0,106,0,103,104,0,0,107],
    'Purchased_Product':['Watch','NA','Oil','NA','Shoes','Smartphone','NA','NA','Laptop'],
    'City':['Mumbai','Delhi','Bangalore','Chennai','Chennai','Delhi','Kolkata','Delhi','Mumbai']
})

我希望来自第一个数据框的价格出现在合并的数据框中。共同元素是“ Product_ID”。请注意,针对product_ID 101,有2个价格-299.00和9898.00。我希望后一个数据进入合并数据集中,即9898.0(因为这是最新价格)

当前,我的代码没有给出正确的答案。两者都给

customerpur = pd.merge(customer,product[['Price','Product_ID']], on="Product_ID", how = "left")
customerpur
    id  name    age Product_ID  Purchased_Product   City    Price
0   1   Olivia  20  101 Watch   Mumbai  299.0
1   1   Olivia  20  101 Watch   Mumbai  9898.0

2 个答案:

答案 0 :(得分:1)

没有明确的时间戳,因此我假设索引是数据帧的顺序。您可以在末尾放置重复项:

customerpur.drop_duplicates(subset = ['id'], keep = 'last')

结果:

   id     name  age  Product_ID Purchased_Product       City    Price
1   1   Olivia   20         101             Watch     Mumbai   9898.0
2   2   Aditya   25           0                NA      Delhi      NaN
3   3     Cory   15         106               Oil  Bangalore    110.0
4   4  Isabell   10           0                NA    Chennai      NaN
5   5  Dominic   30         103             Shoes    Chennai   2999.0
6   6    Tyler   65         104        Smartphone      Delhi  14999.0
7   7   Samuel   35           0                NA    Kolkata      NaN
8   8   Daniel   18           0                NA      Delhi      NaN
9   9   Jeremy   23         107            Laptop     Mumbai  79999.0

请注意keep = 'last'参数,因为我们仅保留最新价格。 如果Yuo对性能或数据集的关注很大,则应在合并之前进行重复数据删除:

product = product.drop_duplicates(subset = ['Product_ID'], keep = 'last')

答案 1 :(得分:1)

在您的数据框中没有最新条目的指示,因此您可能需要首先从101数据框中删除ID为product的第一条条目,如下所示:

result_product = product.drop_duplicates(subset=['Product_ID'], keep='last')

它将保留基于Product_ID的最后一个条目,您可以按照以下方式进行合并:

pd.merge(result_product, customer, on='Product_ID')