我正在尝试使用python实现Lempel-Ziv-Welch算法,但是在使用二进制文件编写文件时遇到了麻烦。
action = sys.argv[3]
if action == "compress":
# initialize dictionary
dictionary = {}
for i in range(0,256):
# for single characters, the value is the same as the key
# in the compressed file, these would appear as is
dictionary[chr(i)] = i
input_file = open(sys.argv[1], 'rb+')
output_file = open(sys.argv[2], 'wb')
data = input_file.read()
# current_data is one byte
current_data = input_file.read(1)
i = 0
j = 1
current_data = data[i:j]
# look for the shortest string not in the dictionary
while i < len(data) - 2:
while current_data in dictionary.keys():
if j < len(data) + 1:
j = j + 1
current_data = data[i:j]
else:
break
# once the shortest string is found, add it to the dictionary
if current_data not in dictionary.keys():
dictionary[current_data] = len(dictionary)
thing_to_write = dictionary[current_data[:-1]]
i = j - 1
current_data = data[i:j]
else:
thing_to_write = dictionary[current_data]
i = i + 1
j = i + 1
# then write to the output file the found string - one character from the end (the longest string that is in the dictionary)\
mylist = []
thing_to_write = format(thing_to_write,'x')
thing_to_write = thing_to_write
for char in thing_to_write:
mylist.append(char.encode('hex'))
for elem in mylist:
output_file.write(elem)
input_file.close()
output_file.close()
print >> sys.stderr, "The size of " + sys.argv[1] + " is " + str(os.path.getsize(sys.argv[1])) + " bytes." + "\n" + "The size of " + sys.argv[2] + " is " + str(os.path.getsize(sys.argv[2])) + " bytes."
我尝试用很多不同的格式编写,比如十六进制,二进制等,但我想我只是把它们写成8位字符。我怎么能写原始二进制文件?
答案 0 :(得分:0)
目前尚不清楚你要写什么。你得到的数据最终可能大于256,所以我假设你想要在输出文件中写入2字节无符号整数?
如果是这种情况,那么我建议您研究Python的struct.pack
函数,该函数旨在将Python类型的数据转换为二进制表示。如果您的数据是字节大小的,那么您可以使用output_file.write(chr(x))
来编写每个字符。
以下使用Python的struct.pack()
:
import os
os.chdir(os.path.dirname(os.path.abspath(__file__)))
import sys
import struct
action = sys.argv[3]
if action == "compress":
# initialize dictionary
dictionary = {}
for i in range(0,256):
# for single characters, the value is the same as the key
# in the compressed file, these would appear as is
dictionary[chr(i)] = i
input_file = open(sys.argv[1], 'rb')
output_file = open(sys.argv[2], 'wb')
data = input_file.read()
# current_data is one byte
current_data = input_file.read(1)
i = 0
j = 1
current_data = data[i:j]
# look for the shortest string not in the dictionary
while i < len(data) - 2:
while current_data in dictionary.keys():
if j < len(data) + 1:
j = j + 1
current_data = data[i:j]
else:
break
# once the shortest string is found, add it to the dictionary
if current_data not in dictionary.keys():
dictionary[current_data] = len(dictionary)
thing_to_write = dictionary[current_data[:-1]]
i = j - 1
current_data = data[i:j]
else:
thing_to_write = dictionary[current_data]
i = i + 1
j = i + 1
# then write to the output file the found string - one character from the end (the longest string that is in the dictionary)\
output_file.write(struct.pack('H', thing_to_write)) # Convert each thing into 2 byte binary
input_file.close()
output_file.close()
print >> sys.stderr, "The size of " + sys.argv[1] + " is " + str(os.path.getsize(sys.argv[1])) + " bytes." + "\n" + "The size of " + sys.argv[2] + " is " + str(os.path.getsize(sys.argv[2])) + " bytes."