熊猫在数据框上合并,同时保持相同的行数

时间:2019-12-10 05:38:06

标签: python pandas dataframe

我想在普通列(例如id)上串联两个python熊猫数据框

First Source数据框是这样的

id  | col 
---------
1   | h1
2   | h2
3   | h3 
3   | h33
3   | h333
4   | h4 
6   | h6 

目标数据框为

id  | col 
---------
1   | h11
2   | h2
3   | h%
3   | h3
4   | h4 
6   | h6 

在这里,带有id=3的行具有重复项。具有id=3的源数据帧具有三行,具有id=3的目标数据帧具有两行。我希望能够保留第一个常见的行数(即两行),例如

id  | col 
---------
1   | h1  | h11
2   | h2  | h2 
3   | h3  | h%
3   | h33 | h3
4   | h4  | h4 
6   | h6  | h6

我尝试在

这样的熊猫中进行简单合并

pd.concat(source_df , target_df, on="id")

我还能做些什么来实现这种逻辑吗?

2 个答案:

答案 0 :(得分:3)

您可以根据需要使用import io.appium.java_client.MobileElement; import io.appium.java_client.android.AndroidDriver; import org.junit.After; import org.junit.Before; import org.junit.Test; import org.openqa.selenium.remote.CapabilityType; import org.openqa.selenium.remote.DesiredCapabilities; import java.net.MalformedURLException; import java.net.URL; public class File{ public static AndroidDriver driver2; @Before public void setUp() throws MalformedURLException { DesiredCapabilities capabilities = new DesiredCapabilities(); capabilities.setCapability("no",true); capabilities.setCapability("newCommandTimeout", 100000); capabilities.setCapability("noReset", true); capabilities.setCapability(CapabilityType.PLATFORM_NAME,"android"); capabilities.setCapability(CapabilityType.BROWSER_NAME, ""); capabilities.setCapability(CapabilityType.VERSION, "8.0"); capabilities.setCapability("deviceName", "ASUS_Z012S"); capabilities.setCapability("appPackage","****"); capabilities.setCapability("appActivity","******"); capabilities.setCapability("noRest", true); driver2 = new AndroidDriver(new URL("http://0.0.0.0:4723/wd/hub"), capabilities); } @After public void tearDown(){ } @Test public void test1(){ MobileElement button= (MobileElement) driver2.findElementByAccessibilityId("man, tab, 2 of 4"); button.click(); } } mergeleft,但在此之前,应按ID分组,并为每个ID组用inner给出行号。

rank

结果:

import pandas as pd

source_df = pd.DataFrame({'id' : [1,2,3,3,3,4,6] , 'col' : ['h1','h2','h3','h33','h333','h4','h6']})
target_df = pd.DataFrame({'id' : [1,2,3,3,4,6] , 'col' : ['h11', 'h2','h%','h3','h4','h6']})

source_df["rn"] = source_df.groupby('id')['id'].rank(method='first')

target_df["rn"] = target_df.groupby('id')['id'].rank(method='first')

new_df = target_df.merge(source_df, on=['id','rn'] , how='left')

答案 1 :(得分:2)

我认为您应该使用merge()函数

pd.merge(source_df, target_df, on="id", how='inner')