Question

我想要一个buffer版本，指向bytearray并且可变。我想将它传递给像io.BufferedIOBase.readinto()这样的I / O函数，而不需要在循环中分配内存。

import sys, struct

ba = bytearray(2000)
lenbuf = bytearray(8)

with open(sys.argv[1]) as fp:
  while True:
    fp.readinto(lenbuf)  # efficient version of fp.read(8)
    dat_len = struct.unpack("Q", lenbuf)
    buf = buffer(ba, 0, dat_len)
    fp.readinto(buf)  # efficient version of fp.read(dat_len), but
                      # yields TypeError: must be read-write buffer, not buffer
    my_parse(buf)

我也尝试了buf = memoryview (buffer(ba, 0, length))但是（基本上）得到了同样的错误。

我认为使用Python不应该是很少关注运行时性能的同义词。

默认情况下，我在Cent6上安装了Python 2.6，但如果真的有必要，可以切换到2.7或3.x.

谢谢！

更新＆lt; - 不，这不是要走的路

~~我对切片进入bytearray的行为感到困惑。下面的成绩单表明我可以简单地从bytearray中取出一个切片：~~

>>> x = bytearray(10**8) >>> cProfile.run('x[10:13]="abc"') 2 function calls in 0.000 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} >>> x.count(b'\x00') 3999999997 >>> len(x) 4000000000 >>> cProfile.run('x[10:13]="abcd"') # intentionally try an inefficient case 2 function calls in 0.750 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.750 0.750 0.750 0.750 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} >>> len(x) 4000000001

但是，＆＃34;可变切片＆＃34;在分配单个字节时没有按预期工作：

>>> x = bytearray(4*10**9) >>> x = bytearray(10) >>> x[2] = 0xff >>> x.count(b'\x00') 9 >>> x[3:5][0] = 0xff >>> x.count(b'\x00') 9 # WHAT

~~我不会在我的应用程序中使用单字节赋值，但我担心是否存在任何根本的误解。~~

Answer 1

您可以让它读取多余的数据，然后在从文件中读取更多内容之前，简单地使用来自bytearray的所有多余数据。

否则你可以使用numpy：

import sys, struct
import numpy as np

buf = np.zeros(2000, dtype=np.uint8)
lenbuf = bytearray(8)

with open(sys.argv[1]) as fp:
    while True:
        fp.readinto(lenbuf)
        dat_len = struct.unpack("Q", lenbuf)
        fp.readinto(buf[:dat_len])
        my_parse(buf[:dat_len])

numpy创建所需的读写缓冲区，索引[：dat_len]返回数据子集的“视图”而不是复制。由于numpy数组符合缓冲区协议，你可以进一步将它们与struct.unpack（）一起使用，就像它们是bytearrays / buffers一样。

Python：如何获得指向字节数组的可变片？

更新＆lt; - 不，这不是要走的路

1 个答案: