win32print结果的字符串编码

时间:2016-01-24 17:42:01

标签: python string python-2.7 unicode utf-8

我有一个非文字字符串,可以通过编程方式从在线打印文档的标题中获取。

当我尝试将其提交给MongoDB时,我得到:

bson.errors.InvalidStringData: strings in documents must be valid UTF-8: 'wxPython: Windows Styles and Events Hunter \xab The Mouse Vs. The Python'

字符串检索代码:

for printStats in printers:

    handle = win32print.OpenPrinter(printStats[2])
    queued = win32print.EnumJobs(handle, 0, -1, 1)

    for printJob in queued:

        username = printJob['pUserName']
        computer = printJob['pMachineName']
        document = printJob['pDocument']
        identity = printJob['JobId']
        jobstate = printJob['Status']

print document
> "wxPython: Windows Styles and Events Hunter « The Mouse Vs. The Python"

2 个答案:

答案 0 :(得分:0)

编码的默认模式是'strict',这会引发错误。

>>> s = u"wxPython: Windows Styles and Events Hunter « The Mouse Vs. The Python"
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xab' in position 43: ordinal not in range(128)

如果您的数据库只接受ASCII,您别无选择,只能进行有损编码。在'ignore'模式下,将跳过所有无法编码的字节:

 >>> s.encode('ascii', 'ignore')
'wxPython: Windows Styles and Events Hunter  The Mouse Vs. The Python'

'replace'模式下,它们被替换为?字符:

 >>> s.encode('ascii', 'replace')
'wxPython: Windows Styles and Events Hunter ? The Mouse Vs. The Python'

最后,有'xmlcharrefreplace'

>>> s.encode('ascii', 'xmlcharrefreplace')
'wxPython: Windows Styles and Events Hunter &#171; The Mouse Vs. The Python'

答案 1 :(得分:0)

根据你在其他答案中的评论,我可以看到你得到的错误是:

«

由于\xab编码为document = printJob['pDocument'].decode("latin-1") >>> print type(document) <type 'unicode'> ,这意味着该字符串可能编码为iso-8995-1,iso-8995-15,windows-1252 / latin-1。这很可能与您机器的区域设置有关。

你只需要在传递给MongoDB之前对其进行解码,MongoDB支持Unicode字符串(在断言时不限于ASCII):

document

您现在可以将mbcs传递给Python MongoDB驱动程序。

要使代码具有可移植性,您可以使用编解码器别名mbcs(代替&#39; latin-1&#39;)。 var noTitle = div.SelectSingleNode(".//h3"); if (noTitle == null || string.IsNullOrEmpty((noTitle.InnerText ?? "").Trim())) newGame.Title = "Unavailable"; else newGame.Title = div.SelectSingleNode(".//h3").InnerText.Trim(); 会自动转换为配置的Windows区域设置(感谢@roeland)