Question

我想知道如何读取.txt文件，输出应该是.xml格式。

我输入文件为

        Paper 1 / White Spaces are included
Single Correct Answer Type

1. Text of question 1
  a) Option 1.a    b) Option 1.b
  c) Option 1.c    d) Option 1.d

2. Text of question 2
  a) This is an example of Option 2.a
  b) Option 2.b has a special char α
  c) Option 2.c
  d) Option 2.d

3. Text of question 3
  a) Option 3.a can span multiple
  lines.
  b) Option 3b
  c) Option 3c
  d) Option 3d

我的代码：

from lxml import etree
import csv

root = etree.Element('data')
#f = open('input1.txt','rb')
rdr = csv.reader(open("input1.txt",newline='\n'))
header = next(rdr)
for row in rdr:
    eg = etree.SubElement(root, 'eg')
    for h, v in zip(header, row):
        etree.SubElement(eg, h).text = v

 f = open(r"C:\temp\input1.xml", "w")
 f.write(etree.tostring(root))
 f.close()

我收到的错误如下：

Traceback (most recent call last):
  File "E:\python3.2\input1.py", line 11, in <module>
    etree.SubElement(eg, h).text = v
  File "lxml.etree.pyx", line 2995, in lxml.etree.SubElement (src\lxml\lxml.etree.c:69677)
  File "apihelpers.pxi", line 188, in lxml.etree._makeSubElement (src\lxml\lxml.etree.c:15691)
  File "apihelpers.pxi", line 1571, in lxml.etree._tagValidOrRaise (src\lxml\lxml.etree.c:29249)
ValueError: Invalid tag name 'ï»¿    Paper 1'

我希望它也考虑到白色空间。我正在使用Python 3.2。有什么建议吗？

Answer 1

您可以从txt文件中读取此信息，组织对象类，然后对其进行序列化。

如何删除/序列化：http://code.activestate.com/recipes/577266-xml-to-python-data-structure-de-serialization/

示例：

f = open('file.txt')
lines = f.readlines()
f.close()

#do something to orginize these lines into objects.

xmlStrings = [serialize(pythonObj) for pythonObj in txtInfoObjs]

g = open('file.xml')
g.write(xmlStrings[0])
g.close()

读取.txt并在python3.2中输出.xml

1 个答案: