如何在Python中使用utf-8创建文件?

时间:2015-06-18 10:02:16

标签: python utf-8 character-encoding

我使用open('test.txt', 'w')创建新文件,其字符集为二进制

>>> open('test.txt', 'w')
<open file 'test.txt', mode 'w' at 0x7f6b973704b0>

$ file -i test.txt.txt 
test2.txt: inode/x-empty; charset=binary

使用模块utf-8分配具有指定字符集的文件(例如codecs)。但是,charset仍然是二进制

>>> codecs.open("test.txt", 'w', encoding='utf-8')
<open file 'test.txt', mode 'wb' at 0x7f6b97370540>

$ file -i test.txt 
test.txt: inode/x-empty; charset=binary

我写了一些内容给test.txt,而charset是 us-ascii

>>> fp. write ("wwwwwwwwwww")
>>> fp.close()

$ file -i test.txt 
test.txt: text/plain; charset=us-ascii

好的,现在,我写了一些特殊的字符(比如Arènes)。然而,

>>> fp = codecs.open("test.txt", 'w', encoding='utf-8')
>>> fp.write("Arènes")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/codecs.py", line 688, in write
    return self.writer.write(data)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

更具体地说,我想将查询结果(使用python-mysqldb)保存到文件中。关键源代码如下:

cur.execute("SELECT * FROM agency")

# Write to a file
with open('test.txt', 'w') as fp :
    for row in cur.fetchall() :
        s = '\t'.join(str(item) for item in row)
        fp.write(s + '\n')

现在,test.txt的字符集是 iso-8859-1 (某些法语字符,例如Arènes)。

因此,我使用codecs.open('test.txt', 'w', encoding='utf-8')来创建文件。但是,遇到以下错误:

Traceback (most recent call last):
  File "./overlap_intervals.py", line 26, in <module>
    fp.write(s + '\n')
  File "/usr/lib/python2.7/codecs.py", line 688, in write
    return self.writer.write(data)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 21: ordinal not in range(128)

如何在Python中使用utf-8创建文件?

2 个答案:

答案 0 :(得分:2)

空文件始终是二进制文件。

$ touch /tmp/foo
$ file -i /tmp/foo 
/tmp/foo: inode/x-empty; charset=binary

把东西放进去,一切都很好。

$ cat > /tmp/foo 
Rübe
Möhre
Mähne
$ file -i /tmp/foo
/tmp/foo: text/plain; charset=utf-8

Python将与cat完全相同。

with open("/tmp/foo", "w") as f:
    f.write("Rübe\n")

检查:

$ cat /tmp/foo
Rübe
$ file -i /tmp/foo
/tmp/foo: text/plain; charset=utf-8

修改

使用Python 2.7,您必须编码Unicode字符串。

with open("/tmp/foo", "w") as f:
    f.write(u"Rübe\n".encode("UTF-8"))

答案 1 :(得分:0)

在Python 3中,您还应该指定write()的编码:

with open("filepath", "w", encoding="utf-8") as f:
    f.write("Arènes")