Question

我有一个包含一系列位的文本文件，在ascii中：

cat myFile.txt
0101111011100011001...

我想以二进制模式将其写入另一个文件，以便我可以在hexeditor中读取它。我怎么能达到那个？我已经尝试使用以下代码转换它：

f2=open(fileOut, 'wb')
    with open(fileIn) as f:
      while True:
            c = f.read(1)
            byte = byte+str(c)
            if not c:
                print "End of file"
                break
            if count % 8 is 0:
                count = 0 
                print hex(int(byte,2))
                try:
                    f2.write('\\x'+hex(int(byte,2))[2:]).zfill(2)
                except:
                     pass
                byte = ''
            count += 1

但这并没有实现我的计划。你有什么提示吗？

Answer 1

一次读取和写入一个字节非常慢。只需在每次调用f.read和f.write时从文件中读取更多数据，您的代码速度就可以提高约45倍：
```
|------------------+--------------------|
| using_loop_20480 | 8.34 msec per loop | 
| using_loop_8     | 354 msec per loop  |
|------------------+--------------------|
```
using_loop是此帖子底部显示的代码。 using_loop_20480是chunksize = 1024 * 20的代码。这意味着一次从文件中读取20480个字节。 using_loop_1与chunksize = 1相同。
关于count % 8 is 0：不要使用is来比较数值;请改用==。以下是is可能会给您错误结果的一些示例（可能不在您发布的代码中，但一般情况下，此处is不合适）：</ p>
```
In [5]: 1L is 1
Out[5]: False

In [6]: 1L == 1
Out[6]: True

In [7]: 0.0 is 0
Out[7]: False

In [8]: 0.0 == 0
Out[8]: True
```
而不是
```
struct.pack('{n}B'.format(n = len(bytes)), *bytes)
```
你可以使用
```
bytearray(bytes)
```
不仅打字少，而且速度稍快一点。
```
|------------------------------+--------------------|
|             using_loop_20480 | 8.34 msec per loop |
| using_loop_with_struct_20480 | 8.59 msec per loop |
|------------------------------+--------------------|
```
bytearrays是这项工作的一个很好的匹配，因为它弥补了这个问题将数据视为字符串和序列之间的差距号。
```
In [16]: bytearray([97,98,99])
Out[16]: bytearray(b'abc')

In [17]: print(bytearray([97,98,99]))
abc
```
如上所示，bytearray(bytes)允许您这样做通过传递一系列int（in。）来定义bytearray range(256)），并允许你把它写出来，好像它是一个 string：g.write(bytearray(bytes))。

def using_loop(output, chunksize):
    with open(filename, 'r') as f, open(output, 'wb') as g:
        while True:
            chunk = f.read(chunksize)
            if chunk == '':
                break
            bytes = [int(chunk[i:i+8], 2)
                     for i in range(0, len(chunk), 8)]
            g.write(bytearray(bytes))

确保chunksize是8的倍数。

这是我用来创建表的代码。请注意，prettytable也会执行与此类似的操作，建议您使用他们的代码而不是我的黑客：table.py

这是我用来计算代码时间的模块：utils_timeit.py。（它使用table.py）。

以下是我用来计算时间using_loop（和其他变体）的代码：timeit_bytearray_vs_struct.py

Answer 2

使用struct：

import struct
...
f2.write(struct.pack('b', int(byte,2))) # signed 8 bit int

或

f2.write(struct.pack('B', int(byte,2))) # unsigned 8 bit int

python文字二进制到十六进制转换

2 个答案: