Question

这是我的代码：

public static String compress(final String input)

{

HashMap<String, Integer> codes = new HashMap<String, Integer>();
for (int i = 0; i < 256; i++)
{
  codes.put((char) i + "", i);
}

StringBuilder outputString = new StringBuilder();

int max_code = 32767;
int next_code = 257;
String currentString = new String();
char c;

for (int i = 0; i < input.length(); i++)
{
  c = input.charAt(i);
  currentString = currentString + c;
  if (!codes.containsKey(currentString))
  {
    if (next_code <= max_code)
    {
      codes.put(currentString, next_code++);
    }
    currentString = currentString.substring(0, currentString.length() - 1);
    outputString.append(codes.get(currentString));
    currentString = c + "";
  }
}
outputString.append(codes.get(currentString));

return outputString.toString();

}

我从文章中得到了以下工作： http://marknelson.us/2011/11/08/lzw-revisited/

我读过一些文章说明这种方法很幼稚而且非常慢： https://code.google.com/p/algorithms-and-datastructures-course/source/browse/trunk/AD_exercise_4/src/ad_exercise_4/controller/LZWNaive.java?r=38

如何加快算法速度。目前需要21秒才能压缩3MB。有人可以提供我应该遵循的伪代码以获得更快的结果。例如1-2秒压缩3MB。

我认为！HashMap.containsKey（）是花费大量时间的线。 21秒中有16秒。

问候。

Answer 1

有一点需要注意。 String类在Java中是不可变的。换句话说，使用+运算符追加到String实际上会创建一个新的String。很多字符串赋值操作都会导致解除引用的String对象生成并触发垃圾收集，这会让你大大减慢时间。

至少，我建议切换到StringBuffer。如果没有很多逻辑变化，您应该立即获得性能。但是StringBuffer仍然不是处理二进制数据的最有效的内存方式，因为它被调整为处理不同字符集中的信息。对于压缩/解压缩，您不关心字符集，只关心位。

java.nio包中的ByteBuffer（Java 6）将是一个巨大的飞跃。

Answer 2

在currentString上完成的一些操作非常昂贵，尤其是currentString的大小增长。

声明：

    currentString = currentString + c;

循环遍历字符串中的所有字符，并复制完整字符串+新字符。

该行：

    if (!codes.containsKey(currentString))

使用currentString的哈希码。由于currentString每次都是一个新字符串，因此需要通过循环遍历整个字符串来计算哈希码（如果需要每次计算，则会使哈希的有用性无效）。

最后一行：

    currentString = currentString.substring(0, currentString.length() - 1);

还需要循环遍历整个字符串并制作它的新副本。

如果您想让这个程序快速运行，您需要不再需要循环使用相同的数据。每次要添加或删除char时都不要创建新的 String ，而是使用某种缓冲区，您可以在其中添加和删除两端的字符。还要考虑一种替代的哈希码方案，因此您不需要重新计算完整哈希（通过遍历整个字符串），因为您使用char扩展了currentString。

Java LZW压缩工作非常慢，持续20秒，3MB

2 个答案: