在课堂上我们讨论的是RLE,我们的教授向我们展示了以下代码。我试图理解它,但我不太明白。所以,如果有人能向我解释这个例子中的RLE是如何工作的,我将非常感激。 我确实理解如何实现数据压缩,但我不理解程序的实现。在评论中,您将找到我的问题。
// Example implementation of a simple variant of // run-length encoding and decoding of a byte sequence
#include <iostream>
#include <cassert>
// PRE: 0 <= value <= 255
// POST: returns true if value is first byte of a tuple, otherwise false
bool is_tuple_start(const unsigned int value)
{
assert(0 <= value && value <= 255);
return value >= 128; //Why is it: value>=128 for first Byte of tuple?
}
// PRE: 1 <= runlength <= 127 //Why must runlength be in this range?
// POST: returns encoded runlength byte
unsigned int make_tuple_start(const unsigned int run_length)
{
assert(1 <= run_length && run_length <= 127);
return run_length + 128; //Why do I add 128?
}
// PRE: n/a
// POST: returns true if value equals the maximal run-length
bool is_max_runlength(const unsigned int value)
{
return value == 127; //same question: why is max. range 127?
}
// PRE: 128 <= value <= 255 //Why this range for value?
// POST: returns runlength of tuple
unsigned int get_runlength(const unsigned int value)
{
assert(128 <= value && value <= 255);
return value - 128; //Why -128?
}
// PRE: n/a
// POST: outputs value and adds a newline
void out_byte(const unsigned int value)
{
std::cout << value << "\n";
}
// PRE: 1 <= runlength <= 127 and 0 <= value <= 255
// POST: outputs run length encoded bytes of tuple
void output(const unsigned int run_length, const unsigned int value)
{
assert(1 <= run_length && run_length <= 127);
assert(0 <= value && value <= 255); //Why is value now between 0 and 255?
if (run_length == 1 && !is_tuple_start(value))
{
out_byte(value);
}
else
{
out_byte(make_tuple_start(run_length));
out_byte(value);
}
}
// PRE: n/a
// POST: returns true if 0 <= value <= 255, otherwise false
bool is_byte(const int value)
{
return 0 <= value && value <= 255;
}
// PRE: n/a
// POST: outputs error if value does not indicate end of sequence
void check_end_of_sequence(const int value)
{
if (value != -1)
{
std::cout << "error\n";
}
}
// PRE: n/a
// POST: reads byte sequence and outputs encoded bytes
void encode()
{
std::cout << "--- encoding: enter byte sequence, terminate with -1\n";
int value;
std::cin >> value;
if (is_byte(value))
{
int prev_value = value; //When/Where does value Change?
unsigned int run_length = 1;
while(true)
{
// read next byte, stop if invalid or end of sequence
std::cin >> value;
if (!is_byte(value))
{ break; }
// output if value has changed or maximal runlength is reached
// otherwise increase length of current run
if (value != prev_value || is_max_runlength(run_length))
{
output(run_length, prev_value);
run_length = 1;
prev_value = value;
}
else { ++run_length; }
}
output(run_length, prev_value);
}
// output "error" if sequence terminated incorrectly
check_end_of_sequence(value);
}
// PRE: n/a
// POST: reads byte sequence and outputs decoded bytes
void decode()
{
std::cout << "--- decoding: enter byte sequence, terminate with -1\n";
int value;
while(true) {
// read next byte, stop if invalid or end of sequence
std::cin >> value; //is value only a Byte? Or the whole sequence?
if (!is_byte(value))
{ break; }
// if this is a tuple output read next byte, otherwise output directly
if (is_tuple_start(value))
{
unsigned int run_length = get_runlength(value);
// next must be a valid byte, otherwise this is an error
std::cin >> value;
if (!is_byte(value))
{
value = 0;
// trigger error in case value = -1
break;
}
// output uncompressed tuple
for(int i = 0; i < run_length; ++i)
{
out_byte(value);
}
}
else { out_byte(value); }
}
// output "error" if sequence terminated incorrectly
check_end_of_sequence(value);
}
int main(const int argc, const char* argv[])
{
std::cout << "--- select mode: 0 = encode / 1 = decode\n";
unsigned int mode;
std::cin >> mode;
if (mode == 0)
{
encode();
}
else if (mode == 1)
{
decode();
}
else
{
std::cout << "--- unknown mode, must be 0 (encode) or 1 (decode)\n";
}
}
我希望得到我的问题的答案,并且代码是可读的,基本上是我的讲义中的复制+粘贴。
答案 0 :(得分:2)
此编码的工作方式是将一系列重复值存储为:
<length> <value>
,而非重复值仅存储为:
<value>
但是当您在编码序列中看到一个数字时,您如何知道它是第一种格式的长度部分,还是只是一个非重复值?它通过使用我们在编码之前在长度上添加128的规则来实现此目的。所以任何数字&gt; 128是启动第一种格式的<length>
字节。
但如果非重复项的价值高于128怎么办?对此的解决方案是对大值使用第一种格式,将其视为具有runlength = 1
的重复值。
这应该回答你的大多数问题,这些问题涉及所有范围的增加和减少。
为什么runlength必须在这个范围内?
我们将所有内容存储为0到255之间的字节。如果长度大于127,那么当我们向它添加128时,我们得到的数字> 255,这不是适合一个字节。
只是一个字节的值?还是整个序列?
声明为int value;
,因此它只是一个数字。每次cin >> value;
它都会得到序列中的下一个字节。
为什么值现在在0到255之间?
值始终允许为整个字节,只有长度限制为127,因为我们将128添加到它们。请参阅上面的解释,高值始终编码为长度优先的元组。