Python,循环遍历XML和解析日期

时间:2014-11-26 16:04:32

标签: python xml for-loop datetime-parsing

我对Python非常陌生,所以我感谢我的方法可能有点粗糙和准备,但任何帮助都会非常受欢迎。

我正在寻找xml行文件的循环,并在其中一个标签中解析日期。我有各自独立的元素;我可以在文件中读取,循环遍历并写入输出文件,另外我还可以获取xml的一行并解析它以提取日期。然而,当我尝试通过逐行读取并解析它们来结合两者时,我得到以下错误:

Traceback (most recent call last):
File "./sadpy10.py", line 19, in <module>
DOMTree = xml.dom.minidom.parse(line)
File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 922, in parse
fp = open(file, 'rb')
IOError: [Errno 2] No such file or directory: '<Header><Version>1.0</Version>....<cd:Data>...</Data>.....  <cd:DateReceived>20070620171524</cd:DateReceived>'

初始输入文件(report2.out)如下所示,另一个输入文件(parseoutput.out)在每行的末尾都删除了相当大的空格,因为我收到IO错误,说该行也是长:

<Header><Version>1.0</Version>....<cd:Data>...</Data>.....<cd:DateReceived>20070620171524</cd:DateReceived>             
<Header><Version>1.0</Version>....<cd:Data>...</Data>.....<cd:DateReceived>20140523012300</cd:DateReceived>            
...

我的代码在这里:

#!/usr/bin/python

from xml.dom.minidom import parse
import xml.dom.minidom
import datetime

f = open('report2.out','r')
file = open("parseoutput.out", "w")
for line in f:
     # I had to strip the whitespace from end of each line as I was getting error saying the lines were too long
    line = line.rstrip()
    file.write(line + '\n')
f.close()
file.close()

f = open("parseoutput.out","r")
for line in f:
   DOMTree = xml.dom.minidom.parse(line)
   collection = DOMTree.documentElement

   get_date = collection.getElementsByTagName("cd:DateReceived").item(0).firstChild.nodeValue
   get_date = datetime.datetime.strptime(get_date, "%Y%m%d%H%M%S").isoformat()
   get_date = get_date.replace("T"," ")
   print get_date
f.close()

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:1)

xml.dom.minidom.parse接受文件名或文件(或类文件对象)作为其第一个参数。因为parseoutput.out在每一行都包含单独的XML文档,所以此功能不适合您。相反,请使用xml.dom.minidom.parseString。它是创建StringIO对象并将其传递给parse的快捷方式。

for line in f:
    DOMTree = xml.dom.minidom.parseString(line)
    collection = DOMTree.documentElement