将多个CSV文件转换为单个XML

时间:2019-02-05 11:35:30

标签: xml python-3.x csv elementtree

我正在尝试使用元素树将多个csv文件(现在两个)转换为xml,但是我没有得到确切的输出。请以更有效的方法指导我。 PS:我是这里的初学者。

import csv
import xml.etree.ElementTree as ET
#from bs4 import BeautifulSoup

root = ET.Element('Policy')

with open("policy.csv","r") as p, open("Att.csv","r") as a, open("rider.csv","r") as r:
  csv_p = csv.reader(p)
  header_p = next(csv_p)
  csv_a = csv.reader(a)
  header_a = next(csv_a) 
  csv_r = csv.reader(r)
  header_r = next(csv_r)
  for row in csv_p:
    pid = row[0]
    print("\n",pid)
    for col in range(len(header_p)):
      ET.SubElement(root, header_p[col]).text = str(row[col])
      for childrow in csv_a:
        if(pid == childrow[0]):
          print("Match found")
          child = ET.SubElement(root,"child")
          for col_a in range(len(header_a)):
            ET.SubElement(child, header_a[col_a]).text = str(childrow[col_a])
            for tailrow in csv_r:
              if(childrow[1] == tailrow[0]):
                print("tail found",tailrow[0])
                tail = ET.SubElement(child,"tail")
                for col_r in range(len(header_r)):
                  ET.SubElement(tail, header_r[col_r]).text = str(tailrow[col_r])  
          r.seek(0)
    a.seek(0)

tree = ET.tostring(root, encoding="UTF-8")
#print(BeautifulSoup(tree, "xml").prettify())

with open("Output.xml", "wb") as f:
    f.write(tree)

with open('Output.xml', 'r') as f:
    print("\n\n",f.read())

输出如下所示,但是您会看到一些标签被重复,因为它们在我正在读取的文件中是多余的:

Policy.csv:

Pid,Name,Date 
101,Life In,3Jan2017
102,Mobile,8Aug2018 

Att.csv:

PId,AId,Name  
101,9001,Pune
101,9002,Mumbai  
102,9003,Delhi

rider.csv:

AId,RID,Name
9001,10001,Ramesh 
9001,10002,Suresh 
9002,10003,Rahul 
9002,10004,Kirti

输出:

<Policy>
    <Pid>101</Pid>
        <child>
            <PId>101</PId>
                <tail><AId>9001</AId>
                        <RID>10001</RID>
                        <Name>Ramesh</Name>
                </tail>
                <tail>
                    <AId>9001</AId>
                    <RID>10002</RID>
                    <Name>Suresh</Name>
                </tail>
                <AId>9001</AId>
                <Name>Pune</Name>
        </child>
        <child>
            <PId>101</PId>
                <tail><AId>9002</AId>
                    <RID>10003</RID>
                    <Name>Rahul</Name>
                </tail>
                <tail><AId>9002</AId>
                    <RID>10004</RID>
                    <Name>Kirti</Name>
                </tail>
                    <AId>9002</AId>
                    <Name>Mumbai</Name>
        </child>
        <Name>Life In</Name>
        <Date>3Jan2017</Date>
</Policy>

所需输出实例:

<Policy>
    <Pid>101</Pid>
    <child>
      <AId>9001</AId>
        <tail>
          <RID>10001</RID>
          <Name>Ramesh</Name>
        </tail>
        <tail>                    
          <RID>10002</RID>
          <Name>Suresh</Name>
        </tail>          
      <Name>Pune</Name>
    </child>
    <Name>Life In</Name>
  <Date>3Jan2017</Date>
</Policy>

1 个答案:

答案 0 :(得分:0)

如果您能够使用lxml,这是我在评论中正在谈论的示例。

希望我的逻辑正确:

  • <table id="editable_table" class="table table-striped table-sm"> <thead> <tr> <th class='th' id=0>Skill</th> <th class='th' id=1>Departmental Average</th> <th class='th' id=2>Employee</th> </tr> </thead> <tbody id="tableData"> <tr> <td> Skill 1 </td> <td> <input type="number" class="form-control" id="depAverage1" placeholder=""> </td> <td> <input type="number" class="form-control" id="employee1" placeholder=""> </td> <td> <button class="btn btn-primary btn-sm">Update</button> </td> </tr> . . . . <tr> <td> Skill 7 </td> <td> <input type="number" class="form-control" id="depAverage7" placeholder=""> </td> <td> <input type="number" class="form-control" id="employee7" placeholder=""> </td> <td> <button class="btn btn-primary btn-sm">Update</button> </td> </tr> </tbody> </table> <button class="btn btn-primary btn-lg pull-right">SAVE</button> </div> <div class=col-md-5> <canvas id="myChart"></canvas> </div> </div> </div> </body> </html> <script> var depAverage1 = document.getElementById("depAverage1").value; . . . var depAverage7 = document.getElementById("depAverage7").value; var employee1 = document.getElementById("depAverage1").value; . . . var employee7 = document.getElementById("depAverage7").value; var ctx = document.getElementById("myChart").getContext('2d'); var myChart = new Chart(ctx, { type: 'radar', data: { labels: ["Red", "Blue", "Yellow", "Green", "Purple", "Orange"], datasets: [{ label: '# of Votes', data: [12, 19, 3, 5, 2, 3], . . . borderWidth: 1 }] }, options: { scales: { yAxes: [{ ticks: { beginAtZero:true } }] } } }); </script> 基于Policy.csv中的一行。由policy唯一标识。
  • Pid中的child基于Att.csv中具有匹配的policy的行。
  • PId中的tail基于rider.csv中具有匹配的child的一行。

我要做的第一件事是将csv转换为临时XML格式。

由于csv文件的标题值将是有效的元素名称,因此我将继续根据这些值创建元素。

如果您的csv文件中的标头值可能不是有效的元素名称,则可以使用通用元素名称并将标头值存储在属性中。 (如果需要,我可以更改示例。)

然后,我将转换临时XML并处理那里的所有分组。由于lxml仅支持XSLT 1.0,因此我们必须使用Muenchian Grouping

示例...

Python

AId

XSLT (transform.xsl)

import csv
from os import path
from lxml import etree


def csv2xml(file):
    result = etree.Element(path.splitext(file)[0])
    with open(file) as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            row_elem = etree.SubElement(result, "row")
            for entry in row:
                entry_elem = etree.SubElement(row_elem, entry.strip().lower())
                entry_elem.text = row.get(entry).strip()
    return result


csv_files = ["policy.csv", "att.csv", "rider.csv"]

temp_xml = etree.Element("policies")

for csv_file in csv_files:
    xml = csv2xml(csv_file)
    temp_xml.append(xml)

xslt = etree.parse("transform.xsl")

xml_output = etree.ElementTree(temp_xml).xslt(xslt)

print(etree.tostring(xml_output, pretty_print=True).decode())

Python将打印以下输出:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="policy" match="policy/row" use="pid"/>
  <xsl:key name="att" match="att/row" use="pid"/>
  <xsl:key name="rider" match="rider/row" use="aid"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/*">
    <xsl:copy>
      <xsl:apply-templates select="policy"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="policy">
      <xsl:for-each select="row[count(.|key('policy', pid)[1])=1]">
        <policy>
          <xsl:apply-templates select="pid"/>
          <xsl:apply-templates select="key('att', pid)"/>
          <xsl:apply-templates select="name|date"/>
        </policy>
      </xsl:for-each>
  </xsl:template>

  <xsl:template match="att/row">
    <child>
      <xsl:apply-templates select="aid"/>
      <xsl:apply-templates select="key('rider', aid)"/>
      <xsl:apply-templates select="name"/>
    </child>
  </xsl:template>

  <xsl:template match="rider/row">
    <tail>
      <xsl:apply-templates select="rid|name"/>
    </tail>
  </xsl:template>

</xsl:stylesheet>

希望这会有所帮助。