我的第一个数据帧
product=pd.DataFrame({
'Product_ID':[101,102,103,104,105,106,107,101],
'Product_name':['Watch','Bag','Shoes','Smartphone','Books','Oil','Laptop','New Watch'],
'Category':['Fashion','Fashion','Fashion','Electronics','Study','Grocery','Electronics','Electronics'],
'Price':[299.0,1350.50,2999.0,14999.0,145.0,110.0,79999.0,9898.0],
'Seller_City':['Delhi','Mumbai','Chennai','Kolkata','Delhi','Chennai','Bengalore','New York']
})
我的第二个数据框有事务
customer=pd.DataFrame({
'id':[1,2,3,4,5,6,7,8,9],
'name':['Olivia','Aditya','Cory','Isabell','Dominic','Tyler','Samuel','Daniel','Jeremy'],
'age':[20,25,15,10,30,65,35,18,23],
'Product_ID':[101,0,106,0,103,104,0,0,107],
'Purchased_Product':['Watch','NA','Oil','NA','Shoes','Smartphone','NA','NA','Laptop'],
'City':['Mumbai','Delhi','Bangalore','Chennai','Chennai','Delhi','Kolkata','Delhi','Mumbai']
})
我希望来自第一个数据框的价格出现在合并的数据框中。共同元素是“ Product_ID”。请注意,针对product_ID 101,有2个价格-299.00和9898.00。我希望后一个数据进入合并数据集中,即9898.0(因为这是最新价格)
当前,我的代码没有给出正确的答案。两者都给
customerpur = pd.merge(customer,product[['Price','Product_ID']], on="Product_ID", how = "left")
customerpur
id name age Product_ID Purchased_Product City Price
0 1 Olivia 20 101 Watch Mumbai 299.0
1 1 Olivia 20 101 Watch Mumbai 9898.0
答案 0 :(得分:1)
没有明确的时间戳,因此我假设索引是数据帧的顺序。您可以在末尾放置重复项:
customerpur.drop_duplicates(subset = ['id'], keep = 'last')
结果:
id name age Product_ID Purchased_Product City Price
1 1 Olivia 20 101 Watch Mumbai 9898.0
2 2 Aditya 25 0 NA Delhi NaN
3 3 Cory 15 106 Oil Bangalore 110.0
4 4 Isabell 10 0 NA Chennai NaN
5 5 Dominic 30 103 Shoes Chennai 2999.0
6 6 Tyler 65 104 Smartphone Delhi 14999.0
7 7 Samuel 35 0 NA Kolkata NaN
8 8 Daniel 18 0 NA Delhi NaN
9 9 Jeremy 23 107 Laptop Mumbai 79999.0
请注意keep = 'last'
参数,因为我们仅保留最新价格。
如果Yuo对性能或数据集的关注很大,则应在合并之前进行重复数据删除:
product = product.drop_duplicates(subset = ['Product_ID'], keep = 'last')
答案 1 :(得分:1)
在您的数据框中没有最新条目的指示,因此您可能需要首先从101
数据框中删除ID为product
的第一条条目,如下所示:
result_product = product.drop_duplicates(subset=['Product_ID'], keep='last')
它将保留基于Product_ID
的最后一个条目,您可以按照以下方式进行合并:
pd.merge(result_product, customer, on='Product_ID')