我从我想要添加到自定义HTML文件中的XML文件中提取元数据。
我可以从XML中提取相关信息,但无法在不覆盖以前信息的情况下获取更新信息以添加/附加到我的HTML文件。
我想为每个处理过的XML制作相同表格布局的块。我认为这可能是一个缩进问题。
import xml.etree.ElementTree as ET
html_head = """
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title></title>
</head>"""
fh = open(r'D:\Temp\CSD_HTML\file.html', 'wb')
fh.write(html_head)
XML_List = [r'D:\Temp\file1.xml', r'D:\Temp\file2.xml']
for xml in XML_List:
print xml + '\n'
path = xml
tree = ET.parse(path)
for node in tree.findall('.//title'):
title = node.text
print 'Title: ' + node.text
for node in tree.findall('.//westbc'):
westbc = node.text
print 'West: ' + node.text
for node in tree.findall('.//eastbc'):
eastbc = node.text
print 'East: ' + node.text
for node in tree.findall('.//northbc'):
northbc = node.text
print 'North: ' + node.text
for node in tree.findall('.//southbc'):
southbc = node.text
print 'South: ' + node.text
for node in tree.findall('.//geogunit'):
geogunit = node.text
print 'Geographic Units: ' + node.text
for node in tree.findall('.//horizdn'):
horizdn = node.text
print 'Projection: ' + node.text
for node in tree.findall('.//ellips'):
ellips = node.text
print 'Ellipsoid: ' + node.text
html_body = """
<body>
<p> </p>
<table width="800" border="0">
<tr>
<td width="309" rowspan="5"><img src="Thumbs/img.jpg" alt="" width="300" height="300" align="left"></td>
<td width="4" rowspan="5"> </td>
<td height="50" colspan="3">Title: """ + title + """</td>
</tr>
<tr>
<td width="150" height="50"> </td>
<td width="165" height="50">North: """ + northbc + """</td>
<td width="150" height="50"> </td>
</tr>
<tr>
<td height="50">West: """ + westbc + """</td>
<td height="50"> </td>
<td height="50">East: """ + eastbc + """</td>
</tr>
<tr>
<td height="50"> </td>
<td height="50">South: """ + southbc + """</td>
<td height="50"> </td>
</tr>
<tr>
<td height="150" colspan="3"><p>Geographic Units: """ + geogunit + """</p>
<p>Projection: """ + horizdn + """</p>
<p>Ellipsoid: """ + ellips + """</p></td>
</tr>
</table>
<p> </p>
</body>"""
fh = open(r'D:\Temp\CSD_HTML\file.html', 'at') ## Remove this line
fh.write(html_body)
html_tail = """
</html>"""
fh = open(r'D:\Temp\CSD_HTML\file.html', 'wb') ## Remove this line
fh.write(html_tail)
fh.close()
del tree
非常感谢您的建议和指导。
抱歉,能够回答我自己的问题。需要删除多个引用:
fh = open(r'D:\Temp\CSD_HTML\file.html', 'wb')
只需在代码开头处引用一次文件即可进行附加工作。