我想在普通列(例如id)上串联两个python熊猫数据框
First Source数据框是这样的
id | col
---------
1 | h1
2 | h2
3 | h3
3 | h33
3 | h333
4 | h4
6 | h6
目标数据框为
id | col
---------
1 | h11
2 | h2
3 | h%
3 | h3
4 | h4
6 | h6
在这里,带有id=3
的行具有重复项。具有id=3
的源数据帧具有三行,具有id=3
的目标数据帧具有两行。我希望能够保留第一个常见的行数(即两行),例如
id | col
---------
1 | h1 | h11
2 | h2 | h2
3 | h3 | h%
3 | h33 | h3
4 | h4 | h4
6 | h6 | h6
我尝试在
这样的熊猫中进行简单合并 pd.concat(source_df , target_df, on="id")
我还能做些什么来实现这种逻辑吗?
答案 0 :(得分:3)
您可以根据需要使用import io.appium.java_client.MobileElement;
import io.appium.java_client.android.AndroidDriver;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.openqa.selenium.remote.CapabilityType;
import org.openqa.selenium.remote.DesiredCapabilities;
import java.net.MalformedURLException;
import java.net.URL;
public class File{
public static AndroidDriver driver2;
@Before
public void setUp() throws MalformedURLException {
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability("no",true);
capabilities.setCapability("newCommandTimeout", 100000);
capabilities.setCapability("noReset", true);
capabilities.setCapability(CapabilityType.PLATFORM_NAME,"android");
capabilities.setCapability(CapabilityType.BROWSER_NAME, "");
capabilities.setCapability(CapabilityType.VERSION, "8.0");
capabilities.setCapability("deviceName", "ASUS_Z012S");
capabilities.setCapability("appPackage","****");
capabilities.setCapability("appActivity","******");
capabilities.setCapability("noRest", true);
driver2 = new AndroidDriver(new URL("http://0.0.0.0:4723/wd/hub"), capabilities);
}
@After
public void tearDown(){
}
@Test
public void test1(){
MobileElement button= (MobileElement) driver2.findElementByAccessibilityId("man, tab, 2 of 4");
button.click();
}
}
或merge
或left
,但在此之前,应按ID分组,并为每个ID组用inner
给出行号。
rank
结果:
import pandas as pd
source_df = pd.DataFrame({'id' : [1,2,3,3,3,4,6] , 'col' : ['h1','h2','h3','h33','h333','h4','h6']})
target_df = pd.DataFrame({'id' : [1,2,3,3,4,6] , 'col' : ['h11', 'h2','h%','h3','h4','h6']})
source_df["rn"] = source_df.groupby('id')['id'].rank(method='first')
target_df["rn"] = target_df.groupby('id')['id'].rank(method='first')
new_df = target_df.merge(source_df, on=['id','rn'] , how='left')
答案 1 :(得分:2)
我认为您应该使用merge()函数
pd.merge(source_df, target_df, on="id", how='inner')