将数组压缩为最小的可能字符串

时间:2016-09-17 19:51:45

标签: javascript jquery underscore.js

可以使用javascript,下划线或Jquery函数来回答这个问题。

给出4个数组:

[17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20]  => x coordinate of unit
[11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15]  => y coordinate of unit
[92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92]  => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39]  => y (unit moves to this direction y)

他们彼此非常相关。 例如,所有数组的第一个元素:[17,11,92,35]是单位x / y坐标以及此单位目标的x / y坐标。 所以这里总共有5 * 4 = 20个单位。每个单位的统计数据略有不同。 这4个x / y单位阵列在视觉上看起来像一个20单位“x”的军队(目标是“o”):

xxxxx                  o
xxxxx                  o
xxxxx                  o
xxxxx                  o

总会有4个阵列。即使1个单元,也会有4个阵列,但每个大小为1.这是最简单的情况,也是最常见的。 在实际情况中,每个单元总共有20个不同的统计数据(键),14个键大多与其他单元组完全相同 - 全部14个键。 所以他们被归为具有相同统计数据的军队。差异仅是单位的坐标以及单位目标的坐标。

我需要将所有这些数据压缩成尽可能小的数据,以后可以解压缩。

当所有这14个键意外相同时,也可能存在更复杂的情况,但坐标与模式完全不同。例如:

[17,17,17,17,17,18,18,18,  215,  18,18,19,19,19,19,19,20,20,20,20,20]  => x coordinate of unit
[11,12,13,14,15,11,12,13,  418,  14,15,11,12,13,14,15,11,12,13,14,15]  => y coordinate of unit
[92,92,92,92,92,92,92,92,  -78,  92,92,92,92,92,92,92,92,92,92,92,92]  => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37,  -887, 38,39,35,36,37,38,39,35,36,37,38,39]  => y (unit moves to this direction y)

在这种情况下,我需要为2个不同的军队提取这个数组。
当军队中少于3个单位时,我只是简单地编写这些单位而不是 模式 - 如[215,418,-78,-887],[..]如果有超过2个单位的军队,我需要一个带有模式的压缩字符串,可以在以后解压缩。在这个例子中有21个单位。它只需要分成1个单位的军队和(5x4 = 20)非军队。

2 个答案:

答案 0 :(得分:0)

假设每个n单位都有一个模式, 使用

编码单位
n: sequence units count
ssx: start of source x
dsx: difference of source x
ssy: start of source y
dsy: difference of source y
stx: start of target x
dtx: difference of target x
sty: start of target y
dty: difference of target y

数组:[n,ssx,dsx,ssy,dsy,stx,dtx,sty,dty]

以便单位:

[17,17,17,17,17],
[11,12,13,14,15],
[92,92,92,92,92],
[35,36,37,38,39]

编码:

[5,17,0,11,1,92,0,35,1]

当然,如果您事先知道,例如y目标对于这样的序列总是相同的,您可以放弃差值参数,以便: [n,ssx,dsx,ssy,---,stx,dtx,sty,---] => [n,ssx,dsx,ssy,stx,dtx,sty],等等。

为了中断您在上一个示例中提到的模式,您可以使用其他“额外”数组,然后将它们插入序列中,并使用:

exsx: extra unit starting x
exsy: extra unit starting y
extx: extra unit target x
exty: extra unit target y
m: insert extra unit at

以便特殊情况编码:

{
patterns:[
   [5,17,0,11,1,92,0,35,1],
   [5,18,0,11,1,92,0,35,1],
   [5,19,0,11,1,92,0,35,1],
   [5,17,0,11,1,92,0,35,1]
],
extras: [
   [215,418,-78,-887,8] // 8 - because you want to insert this unit at index 8
]
}

同样,这是一般编码。模式的任何特定属性可以进一步减少编码表示。

希望这有帮助。

答案 1 :(得分:0)

使用比特流进行高压缩

您可以将值集编码为位流,以便删除未使用的位。您显示的数字不大于-887(忽略负数),这意味着您可以将所有数字组合成10位,每个数字节省54位(Javascript使用64位数字)。

长度压缩

您还有许多重复的数字,您可以使用运行长度压缩。您在比特流中设置一个标志,指示以下一组位表示重复的数字序列,然后您有重复的次数和要重复的值。对于随机数序列,您只需将它们保持原样。

如果使用行程压缩,则在位流中创建块类型结构,这样就可以嵌入进一步的压缩。由于您有许多低于128的数字,因此许多数字可以编码为7位,甚至更少。对于较小的开销(在这种情况下,每块2位),您可以选择最小的位大小来打包该块中的所有数字。

可变位深度数

我创建了一个数字类型值,表示用于在块中存储数字的位数。每个块都有一个数字类型,块中的所有数字都使用该类型。有4种数字类型可以编码为2位。

  • 00 = 4位数。范围0-15
  • 01 = 5位数。范围0-31
  • 10 = 7位数。范围0-127
  • 11 = 10位数。范围0-1023

比特流

为了简化这一过程,您需要一个比特流读/写。它允许您轻松地从位流中读取和读取位。

// Simple unsigned bit stream 
// Read and write to and from a bit stream.
// Numbers are stored as Big endian
// Does not comprehend sign so wordlength should be less than 32 bits
// methods
// eof(); // returns true if read pos > buffer bit size
// read(numberBits); // number of bits to read as one number. No sign so < 32
// write(value,numberBits); // value to write, number of bits to write < 32
// getBuffer(); // return object with buffer and array of numbers, and bitLength the total number of bits
// setBuffer(buffer,bitLength); // the buffers as an array of numbers, and bitLength the total number of bits
// Properties
// wordLength; // read only length of a word.
function BitStream(){
    var buffer = [];
    var pos = 0;
    var numBits = 0;
    const wordLength = 16;
    this.wordLength = wordLength;
    // read a single bit
    var readBit = function(){
        var b = buffer[Math.floor(pos / wordLength)];  // get word
        b = (b >> ((wordLength - 1) - (pos % wordLength))) & 1;
        pos += 1;
        return b;
    }
    // write a single bit. Will fill bits with 0 if wite pos is moved past buffer length
    var writeBit = function(bit){
        var rP = Math.floor(pos / wordLength);
        if(rP >= buffer.length){  // check that the buffer has values at current pos.
            var end = buffer.length;  // fill buffer up to pos with zero

            while(end <= rP){
                buffer[end] = 0;
                end += 1;
            }
        }
        var b = buffer[rP];
        bit &= 1; // mask out any unwanted bits
        bit <<= (wordLength - 1) - (pos % wordLength);
        b |= bit;
        buffer[rP] = b;
        pos += 1;
    }
    // returns true is past eof
    this.eof = function(){
        return pos >= numBits;
    }
    // reads number of bits as a Number
    this.read = function(bits){
        var v = 0;
        while(bits > 0){
            v <<= 1;
            v |= readBit();
            bits -= 1;
        }
        return v;
    }
    // writes value to bit stream 
    this.write = function(value,bits){
        var v;
        while(bits > 0){
            bits -= 1;
            writeBit( (value >> bits) & 1 );
        }
    }
    // returns the buffer and length
    this.getBuffer = function(){
        return {
            buffer : buffer,
            bitLength : pos,
        };                    
    }
    // set the buffer and length and returns read write pos to start
    this.setBuffer = function(_buffer,bitLength){
        buffer = _buffer;
        numBits = bitLength;
        pos = 0;
    }
}    

您的号码格式

现在设计格式。从流中读取的第一个位是序列标志,如果为0,则后面的块将是重复值,如果1,则后面的块将是随机数序列。

块位:描述;

重复阻止持有重复的数字

  • 位0:Val 0 =重复
  • 位1:Val 0 = 4位重复计数或1 = 5位重复计数

然后

  • 位2,3,4,5:4位重复次数 - 1
  • 位6,7:2位数字类型

  • 位2,3,4,5,6:5位重复次数 - 1
  • 位7,8:2位数字类型

其次是

然后根据数字类型

重复一个值

阻止结束

序列块包含一系列随机数

  • 位0:Val 1 =序列
  • 位1:Val 0 =正序Val 1 =负序
  • 位2,3,4,5:序列中的4位数字 - 1
  • 位6,7:2位数字类型

然后是数字格式的数字序列

阻止结束。

继续读取块直到文件结束。

编码器和解码器

以下对象将对平面数字数组进行编码和解码。它只处理长达10位的数字,因此没有超过1023或低于-1023的值。

如果您想要更大的数字,则必须更改使用的数字类型。为此,请更改数组

const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth          
const numberBits = [4,5,7,10]; // the number bit depth lookup; 

如果您希望最大数字为12位-4095到4095(符号位采用块编码)。我还将7位数字类型更改为8.第一个数组用于查找位深度,如果我有一个3位数字,则获得带有numberSize[bitcount]的数字类型以及用于存储的位数号码numberBits[numberSize[bitCount]]

const numberSize = [0,0,0,0,0,1,2,2,2,3,3,3,3]; // the number bit depth          
const numberBits = [4,5,8,12]; // the number bit depth lookup; 



function ArrayZip(){
    var zipBuffer = 0;
    const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth lookup; 
    const numberBits = [4,5,7,10]; // the number bit depth lookup; 
    this.encode = function(data){ // encodes the data
        var pos = 0;
        function getRepeat(){  // returns the number of repeat values
            var p = pos + 1;
            if(data[pos] < 0){
                return 1;  // ignore negative numbers
            }
            while(p < data.length && data[p] === data[pos]){
                p += 1;
            }
            return p - pos;
        }
        function getNoRepeat(){  // returns the number of non repeat values
                                 // if the sequence has negitive numbers then 
                                 // the length is returned as a negative
            var p = pos + 1;
            if(data[pos] < 0){ // negative numbers
                while(p < data.length && data[p] !== data[p-1] && data[p] < 0){
                    p += 1;
                }
                return -(p - pos);
            }
            while(p < data.length && data[p] !== data[p-1] && data[p] >= 0){
                p += 1;
            }
            return p - pos;
        }
        function getMax(count){
            var max = 0;
            var p = pos;
            while(count > 0){
                max = Math.max(Math.abs(data[p]),max);
                p += 1;
                count -= 1;
            }
            return max;
        }
        var out = new BitStream();
        while(pos < data.length){
            var reps = getRepeat();
            if(reps > 1){
                var bitCount = numberSize[Math.ceil(Math.log(getMax(reps) + 1) / Math.log(2))];
                if(reps < 16){
                    out.write(0,1);  // repeat header
                    out.write(0,1);  // use 4 bit repeat count;
                    out.write(reps-1,4); // write 4 bit number of reps
                    out.write(bitCount,2); // write 2 bit number size 
                    out.write(data[pos],numberBits[bitCount]);
                    pos += reps;
                }else {
                    if(reps > 32){  // if more than can fit in one repeat block split it 
                        reps = 32;
                    }
                    out.write(0,1);  // repeat header
                    out.write(1,1);  // use 5 bit repeat count;
                    out.write(reps-1,5); // write 5 bit number of reps
                    out.write(bitCount,2); // write 2 bit number size 
                    out.write(data[pos],numberBits[bitCount]);
                    pos += reps;
                }
            }else{
                var seq = getNoRepeat(); // get number no repeats
                var neg = seq < 0 ? 1 : 0; // found negative numbers
                seq = Math.min(16,Math.abs(seq));
                // check if last value is the start of a repeating block
                if(seq > 1){
                    var tempPos = pos;
                    pos += seq; 
                    seq -= getRepeat() > 1 ? 1 : 0;
                    pos = tempPos;
                }
                // ge the max bit count to hold numbers
                var bitCount = numberSize[Math.ceil(Math.log(getMax(seq) + 1) / Math.log(2))];
                out.write(1,1);  // sequence header
                out.write(neg,1); // write negative flag
                out.write(seq - 1,4); // write sequence length;
                out.write(bitCount,2); // write 2 bit number size 
                while(seq > 0){
                    out.write(Math.abs(data[pos]),numberBits[bitCount]);
                    pos += 1;
                    seq -= 1;
                }
            }
        }
        // get the bit stream buffer
        var buf = out.getBuffer();
        // start bit stream with number of trailing bits. There are 4 bits used of 16 so plenty
        // of room for aulturnative encoding flages.
        var str = String.fromCharCode(buf.bitLength % out.wordLength);
        // convert bit stream to charcters
        for(var i = 0; i < buf.buffer.length; i ++){
            str += String.fromCharCode(buf.buffer[i]);
        }
        // return encoded string
        return str;
    }

    this.decode = function(zip){
        var count,rSize,header,_in,i,data,endBits,numSize,val,neg;
        data = []; // holds character codes
        decompressed = []; // holds the decompressed array of numbers
        endBits = zip.charCodeAt(0); // get the trailing bits count
        for(i = 1; i < zip.length; i ++){  // convert string to numbers
            data[i-1] = zip.charCodeAt(i);
        }
        _in = new BitStream(); // create a bitstream to read the bits
        // set the buffer data and length 
        _in.setBuffer(data,(data.length - 1) * _in.wordLength + endBits);
        while(!_in.eof()){  // do until eof
            header = _in.read(1);  // read header bit
            if(header === 0){ // is repeat header
                rSize = _in.read(1); // get repeat count size
                if(rSize === 0){
                    count = _in.read(4); // get 4 bit repeat count                    
                }else{
                    count = _in.read(5); // get 5 bit repeat count    
                }
                numSize = _in.read(2); // get 2 bit number size type
                val = _in.read(numberBits[numSize]);  // get the repeated value
                while(count >= 0){  // add to the data count + 1 times
                    decompressed.push(val);
                    count -= 1;
                }                
            }else{
                neg = _in.read(1); // read neg flag
                count = _in.read(4); // get 4 bit seq count                    
                numSize = _in.read(2); // get 2 bit number size type
                while(count >= 0){
                    if(neg){  // if negative numbers convert to neg
                        decompressed.push(-_in.read(numberBits[numSize]));
                    }else{
                        decompressed.push(_in.read(numberBits[numSize]));
                    }
                    count -= 1;
                }
            }
        }
        return decompressed;
    }
}

存储比特流的最佳方式是字符串。 Javascript具有Unicode字符串,因此我们可以将16位打包到每个字符

结果和使用方法。

您需要展平阵列。如果你需要添加额外的信息来恢复多维/维数组,只需将其添加到数组中,然后让压缩器将其与其他数据一起压缩。

// flatten the array
var data = [17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39];

var zipper = new ArrayZip();

var encoded = zipper.encode(data);  // packs the 80 numbers in data into 21 characters.
                                    // compression rate of the data array 5120 bits to 336 bits 
                                    // 93% compression.
                                    // or as a flat 7bit ascii string as numbers 239 charcters (including ,)
                                    // 239 * 7 bits = 1673 bits to 336 bits 80% compression.
var decoded = zipper.decode(encoded);

我最初没有注意到负数,因此压缩与负值不相符。

var data = [17,17,17,17,17,18,18,18,  215, 18,18,19,19,19,19,19,20,20,20,20,20, 11,12,13,14,15,11,12,13,  418,  14,15,11,12,13,14,15,11,12,13,14,15, 92,92,92,92,92,92,92,92,  -78,  92,92,92,92,92,92,92,92,92,92,92,92, 35,36,37,38,39,35,36,37,  -887, 38,39,35,36,37,38,39,35,36,37,38,39]

var encoded = zipper.encode(data);  // packs the 84 numbers in data into 33 characters.
                                    // compression rate of the data array 5376 bits to 528 bits
var decoded = zipper.decode(encoded);

<强>摘要

正如您所看到的,这会导致非常高的压缩率(几乎是LZ压缩的两倍)。代码远非最优,您可以轻松实现具有各种设置的多通道压缩器(在编码字符串的开头有12个备用位,可用于选择许多选项以改善压缩。)

此外我没有看到负数,直到我回来发布所以负面修复不好,所以你可以通过修改bitStream来了解更多的东西(即使用&gt;&gt;&gt ;运营商)