如何截断哈希中的数据,以便生成的JSON不超过n个字节?

时间:2011-07-21 14:47:20

标签: ruby algorithm hash

我有一个看起来像这样的哈希:

{ :a => "some string", :b => "another string", :c => "yet another string" }

我最终不想在其上调用to_json,但生成的json-string不能超过n个字节。

如果字符串太大,则应首先截断第一个:c。如果这还不够,:b应该被截断。最后:a。字符串也可以包含多字节字符,如德语变音符号,Ruby版本是1.8.7。 (变音符号首先占用2个字节,但作为json,它们长5个字节。)

我写的是一个循环,它将散列转换为to_json并检查长度。如果它返回的值小于或等于n,否则我会将:a + :b + :c的值连续缩短一半。如果新散列太大(小),我缩短(扩展)原始字符串的1 / 4,1 / 8,1 / 16。最后我得到hash.as_json == n的长度。

这一切都让人觉得非常hackish,虽然所有测试都检查出来但我不确定它是否稳定。

有没有人有一个很好的建议如何妥善解决这个问题?

1 个答案:

答案 0 :(得分:1)

怎么样:

# encoding:UTF-8

require 'rubygems'
require 'json'

def constrained_json(limit, a, b, c)
  output, size, hash  = nil, 0, { :a => a, :b => b, :c => c}
  [:c, :b, :a, :a].each do |key|
    output = hash.to_json
    size = output.bytesize
    break if size <= limit
    # on 1.9:
    # hash[key] = hash[key][0...(limit - size)]
    # on 1.8.7
    hash[key] = hash[key].unpack("U*")[0...(limit - size)].pack("U*")
  end
  raise "Size exceeds limit even after truncation" if size > limit
  output
end

38.downto(21) do |length|
  puts "# #{constrained_json(length, "Qué te", "parece", "eh?")}"
end

# {"a":"Qué te","b":"parece","c":"eh?"}
# {"a":"Qué te","b":"parece","c":"eh"}
# {"a":"Qué te","b":"parece","c":"e"}
# {"a":"Qué te","b":"parece","c":""}
# {"a":"Qué te","b":"parec","c":""}
# {"a":"Qué te","b":"pare","c":""}
# ...
# {"a":"","b":"","c":""}
# test.rb:14:in `constrained_json': Size exceeds limit even after truncation (RuntimeError)