Python合并2列表/ SQL JOIN

时间:2015-07-03 15:12:10

标签: python join pandas merge

如果我在python中有2个列表或数据框(pandas),我该如何合并/匹配/加入它们?

例如:

列表/ DF 1:

Table_Name  Table_Alias
  tab_1          t1
  tab_2          t2
  tab_3          t3

列表/ DF 2:

Table_Alias   Variable_Name
    t1            Owner
    t1            Owner_Id
    t2            Purchase_date
    t3            Maintenance_cost

期望的结果:

Table_Name   Table_Alias   Variable_Name
   tab_1         t1            Owner
   tab_1         t1            Owner_Id
   tab_2         t2            Purchase_date
   tab_3         t3            Maintenance_cost

注意:如果我在R中这样做,我会使用类似的东西:

df3 <- merge(df1, df2, by = 'Table_Alias', all.y = T)

在python中执行此操作的最佳方法是什么?

2 个答案:

答案 0 :(得分:2)

你想要一个&#39;外部&#39; merge

private PDDocumentCatalog makeA3compliant(PDDocument doc) throws IOException, TransformerException  {
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(doc);
cat.setMetadata(metadata);

XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);

XMPSchemaDublinCore dc = xmp.addDublinCoreSchema();
String creator = "TestCr";
String producer = "testPr";
dc.addCreator(creator);
dc.setAbout("");

XMPSchemaBasic xsb = xmp.addBasicSchema();
xsb.setAbout("");
xsb.setCreatorTool(creator);
xsb.setCreateDate(GregorianCalendar.getInstance());

PDDocumentInformation pdi = new PDDocumentInformation();
pdi.setProducer(producer);
pdi.setAuthor(creator);
doc.setDocumentInformation(pdi);

XMPSchemaPDF pdf = xmp.addPDFSchema();
pdf.setProducer(producer);
pdf.setAbout("");

PDMarkInfo markinfo = new PDMarkInfo();
markinfo.setMarked(true);
doc.getDocumentCatalog().setMarkInfo(markinfo);

pdfaid.setPart(3);
pdfaid.setConformance("A");
pdfaid.setAbout("");

metadata.importXMPMetadata(xmp);

return cat;

它将匹配两个dfs的重叠列,并返回匹配行的并集。

答案 1 :(得分:-1)

我只想使用pd.merge(df1, df2, how='outer',on='alias')

df1 = pd.DataFrame({ "table_name":['tab1',"tab2","tab3"],"talias ['t1','t2','t3']})
df2 = pd.DataFrame({"talias":['t1',"t1","t2",'t3'], "vname,['Owner','Owner_Id','Purchase_date','Maintenance_cost']})


pd.merge(df1,df2,how='outer', on='talias')


Out:
    Table_Alias Table_Name  Variable_Name
0   t1  tab1    Owner
1   t1  tab1    Owner_Id
2   t2  tab2    Purchase_date
3   t3  tab3    Maintenance_cost