如何获取特定的xml标签并显示在不同的表中?

时间:2019-05-01 17:51:08

标签: python html xml parsing

给出一个XML文件,我想在浏览器中显示一个带有年份的日记表,另一个包含带有年份的会议(书名)的表。

下面是XML文件的格式

<dblpperson>
 <r>
  <article>
    <author orcid="0000-0001-6062-7524">Meinard</author>   
    <author>Bryan Pardo</author><author>Gautham</author>
    <author>Vesa</author>
    <title>Recent Advances in Music Signal Processing [From the Guest 
    Editors].</title>
    <year>2019</year>
    <journal>IEEE Signal Process. Mag.</journal>
    <ee>https://doi.org/10.1109/MSP.2018.2876190</ee>
  </article> 
 </r>
 <r>
  <article>
    <author>Müller</author>   
    <author>Vesa</author>
    <author>Patricio</author>
    <title>Automatic Drum Transcription.</title>
    <year>2018</year>
    <booktitle>ICASSP</booktitle>
    <ee>https://doi.org/10.1109/MSP.2018.2876190</ee>
   </article> 
 </r>
...

以下是我到目前为止尝试过的

@bottle.route("/authors/<name>/synthesis", method='POST')
...
list_of_journals = []
list_of_conf = []

root = ET.fromstring(data.content)
for publication in root.findall('r'):
    for tags in publication:

        #separate journals from conferences
        attribute = tags.attrib['key'].split('/')
        attribute = attribute[0]
        #print(type(attribute))

        if attribute == 'journals':
            titre_j = tags.find('title').text
            ...
            list_of_journals.append([titre_j, année_j, journal_j])
        elif attribute == 'conf':
            titre_c = tags.find('title').text
            ...
            list_of_conf.append([titre_c, année_c, journal_c])

        table = """
                 <table style="width:80%">
                 <tr>
                 <th>Journal</th>  
                 <th>Year</th>
                 </tr>
                 <tr>
                 <td> """ + str(list_of_journals[0][0]) + """</td>
                 ...

1 个答案:

答案 0 :(得分:0)

尝试一下此代码

import xml.etree.ElementTree as ET

XML = '''<dblpperson>
 <r>
  <article>
    <author orcid="0000-0001-6062-7524">Meinard</author>   
    <author>Bryan Pardo</author><author>Gautham</author>
    <author>Vesa</author>
    <title>Recent Advances in Music Signal Processing [From the Guest 
    Editors].</title>
    <year>2019</year>
    <journal>IEEE Signal Process. Mag.</journal>
    <ee>https://doi.org/10.1109/MSP.2018.2876190</ee>
    <booktitle>Other ICASSP</booktitle>

  </article> 
 </r>
 <r>
  <article>
    <author>Muller</author>   
    <author>Vesa</author>
    <author>Patricio</author>
    <title>Automatic Drum Transcription.</title>
    <year>2018</year>
    <journal>IEEE Signal Process. Mag. 123</journal>
    <booktitle>ICASSP</booktitle>
    <ee>https://doi.org/10.1109/MSP.2018.2876190</ee>
   </article> 
 </r></dblpperson>'''


def make_table(headers, rows):
    html = '<table>'
    html += '<tr>'
    html += ''.join(['<th>{}</th>'.format(h) for h in headers])
    html += '</tr>'
    for row in rows:
        html += '<tr>'
        html += ''.join(['<td>{}</td>'.format(d) for d in row])
        html += '</tr>'
    html += '</table>'
    return html


journal_data = []
booktitle_data = []
root = ET.fromstring(XML)
articles = root.findall('.//article')
for article in articles:
    journal_data.append([])
    booktitle_data.append([])
    for child in article.getchildren():
        if child.tag == 'year':
            journal_data[-1].append(child.text)
            booktitle_data[-1].append(child.text)
        elif child.tag == 'booktitle':
            booktitle_data[-1].append(child.text)
        elif child.tag == 'journal':
            journal_data[-1].append(child.text)

print(make_table(['Year', 'Journal'], journal_data))
print(make_table(['Year', 'Booktitle'], booktitle_data))

输出

<table><tr><th>Year</th><th>Journal</th></tr><tr><td>2019</td><td>IEEE Signal Process. Mag.</td></tr><tr><td>2018</td><td>IEEE Signal Process. Mag. 123</td></tr></table>
<table><tr><th>Year</th><th>Booktitle</th></tr><tr><td>2019</td><td>Other ICASSP</td></tr><tr><td>2018</td><td>ICASSP</td></tr></table>