Question

我正在使用python mini-Dom模块解析XML文件。将数据写入文件时会出现错误，如 Unicode Encode Error: 'ASCII' codec can't encode characters in position 0-3: ordinal not in range(128) 。但Out put在命令行上完美打印请告诉我解决方案。

我的XML文件是：

   <?xml version="1.0"?>
    <Feature>
        <Word Root  ="ਨੌਕਰ-ਚਾਕਰ">
            <info Inflection  ="ਨੌਕਰਾਂ-ਚਾਕਰਾਂ">
        <posinfo gender  ="Masculine" number  ="Plural" case  ="Oblique" />

                </info>
        </Word>
                </Feature>

我的python代码是：

import sys

from xml.dom import minidom

file=open("npu.txt","w+")
doc = minidom.parse("NPU.xml")
word = doc.getElementsByTagName("Word")
for each in word:
    # print "root"+each.getAttribute("Root")
    file.write(each.getAttribute("Root")+"\n")
    hh=each.getElementsByTagName("info")

    for each1 in hh:
        # print "inflection"+each1.getAttribute("Inflection")
        file.write(each1.getAttribute("Inflection")+"\t")

        vv=each1.getElementsByTagName("posinfo")
        for each2 in vv:
            # print each2.getAttribute("gender")
            # print each2.getAttribute("number")
            # print each2.getAttribute("case")
            file.write( each2.getAttribute("gender")+",")
            file.write( each2.getAttribute("number")+",")
            file.write(each2.getAttribute("case"))
        file.write("\n")
    file.write("--------\n")

Answer 1

encode data while writing-
#!/usr/bin/env python
# -*- coding: utf-8 -*-
file=open("npu.txt","w+") 
file.write("ਨੌਕਰ-ਚਾਕਰ")

Answer 2

问题不在于解析XML的方式，这是一个编码问题。

错误是由文本编码（UTF-8）引起的。您试图将您的文本写为ASCII，但不包括您正在使用的字符。

尝试使用以下编解码器：

import codecs

file = codecs.open("npu.txt", "w+", "utf-8")
file.write("ਨੌਕਰ-ਚਾਕਰ".decode('utf-8'))
file.close()

编辑：

您还可以将默认编码设置为UTF-8，添加特殊注释 # -*- coding: UTF-8 -*- 在python源码的开头。默认编码为ASCII（7位）。请注意，Python标识符仍限制为ASCII字符。

将数据写入文件时Python返回错误（Python 2.7）

2 个答案: