如果我在python中有2个列表或数据框(pandas),我该如何合并/匹配/加入它们?
例如:
列表/ DF 1:
Table_Name Table_Alias
tab_1 t1
tab_2 t2
tab_3 t3
列表/ DF 2:
Table_Alias Variable_Name
t1 Owner
t1 Owner_Id
t2 Purchase_date
t3 Maintenance_cost
期望的结果:
Table_Name Table_Alias Variable_Name
tab_1 t1 Owner
tab_1 t1 Owner_Id
tab_2 t2 Purchase_date
tab_3 t3 Maintenance_cost
注意:如果我在R中这样做,我会使用类似的东西:
df3 <- merge(df1, df2, by = 'Table_Alias', all.y = T)
在python中执行此操作的最佳方法是什么?
答案 0 :(得分:2)
你想要一个&#39;外部&#39; merge
:
private PDDocumentCatalog makeA3compliant(PDDocument doc) throws IOException, TransformerException {
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(doc);
cat.setMetadata(metadata);
XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);
XMPSchemaDublinCore dc = xmp.addDublinCoreSchema();
String creator = "TestCr";
String producer = "testPr";
dc.addCreator(creator);
dc.setAbout("");
XMPSchemaBasic xsb = xmp.addBasicSchema();
xsb.setAbout("");
xsb.setCreatorTool(creator);
xsb.setCreateDate(GregorianCalendar.getInstance());
PDDocumentInformation pdi = new PDDocumentInformation();
pdi.setProducer(producer);
pdi.setAuthor(creator);
doc.setDocumentInformation(pdi);
XMPSchemaPDF pdf = xmp.addPDFSchema();
pdf.setProducer(producer);
pdf.setAbout("");
PDMarkInfo markinfo = new PDMarkInfo();
markinfo.setMarked(true);
doc.getDocumentCatalog().setMarkInfo(markinfo);
pdfaid.setPart(3);
pdfaid.setConformance("A");
pdfaid.setAbout("");
metadata.importXMPMetadata(xmp);
return cat;
它将匹配两个dfs的重叠列,并返回匹配行的并集。
答案 1 :(得分:-1)
我只想使用pd.merge(df1, df2, how='outer',on='alias')
df1 = pd.DataFrame({ "table_name":['tab1',"tab2","tab3"],"talias ['t1','t2','t3']})
df2 = pd.DataFrame({"talias":['t1',"t1","t2",'t3'], "vname,['Owner','Owner_Id','Purchase_date','Maintenance_cost']})
pd.merge(df1,df2,how='outer', on='talias')
Out:
Table_Alias Table_Name Variable_Name
0 t1 tab1 Owner
1 t1 tab1 Owner_Id
2 t2 tab2 Purchase_date
3 t3 tab3 Maintenance_cost