我想知道如何读取.txt文件,输出应该是.xml格式。
我输入文件为
Paper 1 / White Spaces are included
Single Correct Answer Type
1. Text of question 1
a) Option 1.a b) Option 1.b
c) Option 1.c d) Option 1.d
2. Text of question 2
a) This is an example of Option 2.a
b) Option 2.b has a special char α
c) Option 2.c
d) Option 2.d
3. Text of question 3
a) Option 3.a can span multiple
lines.
b) Option 3b
c) Option 3c
d) Option 3d
我的代码:
from lxml import etree
import csv
root = etree.Element('data')
#f = open('input1.txt','rb')
rdr = csv.reader(open("input1.txt",newline='\n'))
header = next(rdr)
for row in rdr:
eg = etree.SubElement(root, 'eg')
for h, v in zip(header, row):
etree.SubElement(eg, h).text = v
f = open(r"C:\temp\input1.xml", "w")
f.write(etree.tostring(root))
f.close()
我收到的错误如下:
Traceback (most recent call last):
File "E:\python3.2\input1.py", line 11, in <module>
etree.SubElement(eg, h).text = v
File "lxml.etree.pyx", line 2995, in lxml.etree.SubElement (src\lxml\lxml.etree.c:69677)
File "apihelpers.pxi", line 188, in lxml.etree._makeSubElement (src\lxml\lxml.etree.c:15691)
File "apihelpers.pxi", line 1571, in lxml.etree._tagValidOrRaise (src\lxml\lxml.etree.c:29249)
ValueError: Invalid tag name ' Paper 1'
我希望它也考虑到白色空间。 我正在使用Python 3.2。有什么建议吗?
答案 0 :(得分:1)
您可以从txt文件中读取此信息,组织对象类,然后对其进行序列化。
如何删除/序列化:http://code.activestate.com/recipes/577266-xml-to-python-data-structure-de-serialization/
示例:
f = open('file.txt')
lines = f.readlines()
f.close()
#do something to orginize these lines into objects.
xmlStrings = [serialize(pythonObj) for pythonObj in txtInfoObjs]
g = open('file.xml')
g.write(xmlStrings[0])
g.close()