Question

我想将字符串保存到新的txt文件中。

字符串的编码是'utf-8'（我想是这样）并且它包含一些中文字符

但该文件是GB2312

这是我的代码，我省略了一些：

# -*- coding:utf-8 -*-
# Python 3.4 window 7

def getUrl(self, url, coding='utf-8'):
    self.__reCompile = {}
    req = request.Request(url)
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 UBrowser/5.5.9703.2 Safari/537.36')
    with request.urlopen(req) as response:
        return response.read().decode(coding)

def saveText(self,filename,content,mode='w'):
    self._checkPath(filename)
    with open(filename,mode) as f:
        f.write(content)

joke= self.getUrl(pageUrl)
#some re transform such as re.sub('<br>','\r\n',joke)
self.saveText(filepath+'.txt',joke,'a')

有时会出现UnicodeEncodeError：

Answer 1

你的例外是在＆＃39; saveText＆＃39;中引发的，但是我无法看到你是如何实现它的，所以我会尝试重现错误并给出修复建议。< / p>

在＆＃39; getUrl＆＃39;你返回一个解码后的字符串（.decode（＆＃39; utf-8＆＃39;））我的猜测是，在＆＃39; saveText＆＃39;中，你忘了在写入文件之前对其进行编码。

重现错误

尝试重现错误，我这样做了：

# String with unicode chars, decoded like in you example
s = 'æøå'.decode('utf-8') 

# How saveText could be:
# Encode before write
f = open('test', mode='w')
f.write(s)
f.close()

这给出了类似的例外：

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-36-1309da3ad975> in <module>()
      5 # Encode before write
      6 f = open('test', mode='w')
----> 7 f.write(s)
      8 f.close()

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

两种解决方法

你可以这样做：

# String with unicode chars, decoded like in you example
s = 'æøå'.decode('utf-8') 

# How saveText could be:
# Encode before write
f = open('test', mode='w')
f.write(s.encode('utf-8'))
f.close()

或者您可以尝试使用模块＆＃39;编解码器＆＃39;

来编写文件

import codecs

# String with unicode chars, decoded like in you example
s = 'æøå'.decode('utf-8') 

# How saveText could be:
f = codecs.open('test', encoding='utf-8', mode='w')
f.write(s)  
f.close()

希望这有帮助。

Answer 2

字符串的编码是＆＃39; utf-8＆＃39;（我想是这样）并且它包含一些中文字符

您已使用UTF-8解码了远程服务器的响应。一旦它被解码为Python字符串，它就不再被编码并作为 Unicode点在内存中有效存储。

您获得的错误是因为Python正在尝试使用您的代码页将字符串转换为字节。由于您的Windows区域设置，它选择了GBK，它不支持所有Unicode字符。

要保存，您只需使用指定编码打开输出文件，使用encoding参数open()（Python 3.在Python 2中，使用io.open()）。在你的情况下，＆＃34; UTF-8＆＃34;可能是适当的编码使用。

您的saveText()方法需要更新为：

def saveText(self,filename,content,mode='w',encoding="utf-8"):
    self._checkPath(filename)
    with open(filename,mode,encoding) as f:
        f.write(content)

您的HTTP数据可能会遇到问题。在解码响应时，您假设远程内容为UTF-8。情况并非总是如此。您可以分析HTTP响应标头以获得正确的编码，或使用Requests库，它可以为您执行此操作。您的URL getter看起来像：

def getUrl(url):
    headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 UBrowser/5.5.9703.2 Safari/537.36'}
    response = requests.get(url, headers=headers)
    response.raise_for_status() # Throw an exception on errors
    return response.text

Answer 3

我认为您的终端正在使用的编码并不支持该字符。 Python处理它很好，我认为它是你的输出编码无法处理它。

另见looks like to me

为什么我不能将我的文件保存为utf-8格式

3 个答案: