我在Python中构建了一个霍夫曼编码器,但由于我将这些位(代表字符)存储为字符串,因此编码后的文本比原始文本大。如何使用实际位来正确压缩文本?
答案 0 :(得分:1)
You can convert a str
of 1s and 0s to an int
type variable like this:
>>> int('10110001',2)
177
And you can convert int
s back to str
s of 1s and 0s like this:
>>> format(177,'b')
'10110001'
Also, note that you can write int
literals in binary using a leading 0b
, like this:
>>> foo = 0b10110001
>>> foo
177
Now, before you say "No, I asked for bits, not ints!" think about that for a second. An int
variable isn't stored in the computer's hardware as a base-10 representation of the number; it's stored directly as bits.
EDIT: Stefan Pochmann points out that this will drop leading zeros. Consider:
>>> code = '000010110001'
>>> bitcode = int(code, 2)
>>> format(bitcode, 'b')
'10110001'
So how do you keep the leading zeros? There are a few ways. How you go about it will likely depend on whether you want to type cast each character into an int
first and then concatenate them, or concatenate the strings of 1s and 0s before type casting the whole thing as an int
. The latter will probably be much simpler. One way that will work well for the latter is to store the length of the code and then use that with this syntax:
>>> format(bitcode, '012b')
'000010110001'
where '012b'
tells the format function to pad the left of the string with enough zeros to ensure a minimum length of 12. So you can use it in this way:
>>> code = '000010110001'
>>> code_length = len(code)
>>> bitcode = int(code, 2)
>>> format(bitcode, '0{}b'.format(code_length))
'000010110001'
Finally, if that {}
and second format
is unfamiliar to you, read up on string formatting.