Question

我正在尝试验证wkhtmltopdf生成的内容在运行中是否相同，但每次运行wkhtmltopdf时，我都会在同一页面上获得不同的哈希/校验和值。我们正在谈论一些真正基本的东西，比如使用html页面：

<html>
<body>
<p> This is some text</p>
</body
</html>

每次运行wkhtmltopdf时，我都会得到一个不同的md5或sha256哈希：

./wkhtmltopdf example.html ~/Documents/a.pdf

使用python hasher：

def shasum(filename):
    sha = hashlib.sha256()
    with open(filename,'rb') as f: 
        for chunk in iter(lambda: f.read(128*sha.block_size), b''): 
            sha.update(chunk)
    return sha.hexdigest()

或md5版本，只是将sha256与md5交换

为什么wkhtmltopdf会生成一个足以导致不同校验和的不同文件，有没有办法不这样做？一些命令行可以传入以防止这种情况？

我试过--default-header， - no-pdf-compression和--disable-smart-shrinking

这是在MAC osx上，但我在其他机器上生成了这些pdf，并以相同的结果下载它们。

wkhtmltopdf version = 0.10.0 rc2

Answer 1

我尝试了这个并在emacs中打开了生成的PDF。 wkhtmltopdf在PDF中嵌入了“/ CreationDate”字段。每次运行都会有所不同，并且会在运行之间搞乱哈希值。

我没有看到禁用“/ CreationDate”字段的选项，但在计算哈希值之前将其从文件中删除会很简单。

Answer 2

我写了一个方法将创建日期从预期输出复制到当前生成的文件。它在Ruby中，参数是任何像IO一样行走和嘎嘎的类：

def copy_wkhtmltopdf_creation_date(to, from)
  to_current_pos, from_current_pos = [to.pos, from.pos]
  to.pos = from.pos = 74
  to.write(from.read(14))
  to.pos, from.pos = [to_current_pos, from_current_pos]
end

Answer 3

我受到Carlos的启发，编写了一个不使用硬编码索引的解决方案，因为在我的文档中，索引与Carlos＆＃39; 74。

另外，我还没有打开文件。当我没有找到CreationDate时，我处理了提前返回的情况。

def copy_wkhtmltopdf_creation_date(to, from)
  index, date = File.foreach(from).reduce(0) do |acc, line|
    if line.index("CreationDate")
      break [acc + line.index(/\d{14}/), $~[0]]
    else
      acc + line.bytesize
    end
  end

  if date # IE, yes this is a wkhtmltopdf document
    File.open(to, "r+") do |to|
      to.pos = index
      to.write(date)
    end
  end
end

Answer 4

我们通过使用简单的正则表达式剥离创建日期来解决问题。

preg_replace("/\\/CreationDate \\(D:.*\\)\\n/uim", "", $file_contents, 1);

执行此操作后，我们每次都可以获得一致的校验和。

wkhtmltopdf在每次运行时生成不同的校验和

4 个答案: