Question

只要涉及非ASCII字符，使用NumPy loadtxt和savetxt函数就会失败。这些功能主要用于数字数据，但也支持字母数字页眉/页脚。

loadtxt和savetxt似乎都在应用latin-1编码，我发现它与Python 3的其余部分非常正交，后者完全具有unicode感知并且似乎总是使用utf -8作为默认编码。

鉴于NumPy没有移动到utf-8作为默认编码，我是否至少可以通过一些已实现的函数/属性或已知的hack来改变latin-1的编码，或仅用于{ {1}} / loadtxt或NumPy的全部内容？

使用Python 2是不可能的，但是在使用Python 3时它确实应该不是问题。我使用Python 3.x和NumPy的最后几个版本的任何组合发现了问题

示例代码

考虑包含内容

的文件savetxt

data.txt

尝试使用

加载它

# This is π
3.14159265359

因import numpy as np pi = np.loadtxt('data.txt') print(pi)例外而失败，说明latin-1编解码器无法对字符＆＃39; UnicodeEncodeError＆＃39;进行编码。（\u03c0字符）。

这令人沮丧，因为π仅出现在评论/标题行中，因此π没有理由甚至尝试对此字符进行编码。

我可以通过使用loadtxt显式跳过第一行来成功读取文件，但是必须知道标题行的确切数量是不方便的。

如果我尝试使用pi = np.loadtxt('data.txt', skiprows=1) 编写一个unicode字符，则抛出相同的异常：

savetxt

要成功完成此任务，我首先必须通过其他方式编写标头，然后将数据保存到使用np.savetxt('data.txt', [3.14159265359], header='# This is π')模式打开的文件对象，例如

'a+b'

不用说，这既丑陋又不方便。

解决方案

我通过hpaulj解决了这个问题，我认为这可以很好地拼出来。在我的程序顶部附近我现在做

with open('data.txt', 'w') as f:
    f.write('# This is π\n')
with open('data.txt', 'a+b') as f:
    np.savetxt(f, [3.14159265359])

之后import numpy as np asbytes = lambda s: s if isinstance(s, bytes) else str(s).encode('utf-8') asstr = lambda s: s.decode('utf-8') if isinstance(s, bytes) else str(s) np.compat.py3k.asbytes = asbytes np.compat.py3k.asstr = asstr np.compat.py3k.asunicode = asstr np.lib.npyio.asbytes = asbytes np.lib.npyio.asstr = asstr np.lib.npyio.asunicode = asstr和np.loadtxt正确处理Unicode。

请注意，对于较新版本的NumPy（我可以确认1.14.3，但也适用于旧版本），不需要这个技巧，因为默认情况下现在可以正确处理Unicode。

Answer 1

一对黑客：

以二进制模式打开文件，并将打开的文件对象传递给loadtxt：

In [12]: cat data.txt
# This is π
3.14159265359

In [13]: with open('data.txt', 'rb') as f:
    ...:     result = np.loadtxt(f)
    ...:     

In [14]: result
Out[14]: array(3.14159265359)

使用latin1编码打开文件，并将打开的文件对象传递给loadtxt：

In [15]: with open('data.txt', encoding='latin1') as f:
    ...:     result = np.loadtxt(f)
    ...:     

In [16]: result
Out[16]: array(3.14159265359)

Answer 2

至少savetxt编码在

中处理

Signature: np.lib.npyio.asbytes(s)
Source:   
    def asbytes(s):
        if isinstance(s, bytes):
            return s
        return str(s).encode('latin1')
File:      /usr/local/lib/python3.5/dist-packages/numpy/compat/py3k.py
Type:      function

Signature: np.lib.npyio.asstr(s)
Source:   
    def asstr(s):
        if isinstance(s, bytes):
            return s.decode('latin1')
        return str(s)
File:      /usr/local/lib/python3.5/dist-packages/numpy/compat/py3k.py
Type:      function

使用

将标头写入wb文件

        header = header.replace('\n', '\n' + comments)
        fh.write(asbytes(comments + header + newline))

Write numpy unicode array to a text file有一些我以前的探索。在那里，我专注于数据中的字符，而不是标题。

使用NumPy loadtxt / savetxt指定编码

示例代码

解决方案

2 个答案: