我需要使用以下格式解析文本文件并将其转换为将转换为JSON的Hash。
文本文件具有以下格式:
HD040008000415350110XXXXXXXXXX0208XXXXXXXX0302EN0403USA0502EN0604000107014
EM04000800030010112TME001205IQ50232Blue Point Coastal Cuisine. INC.06145655th Avenue0805921010909SAN DIEGO1008Downtown1102CA1203USA
每一行都是一组使用Key值格式的段。例如,第二行是:
EM
是关键04
是值的长度,包括空格0008
是值打破它,它看起来像EM 04 0008
。下一个段键是数字的,以00
开头,然后递增直到行的末尾,然后重新开始。我需要遍历文本文件中的每一行。
我需要能够将其转换为Ruby哈希值,而后者又会在API响应中转换为JSON。
目前的格式是:
EM0400080003001
需要解析为:
{"EM" => 0008, "00" => "001"}
答案 0 :(得分:2)
这是一种非常常见的编码类型,称为Type-Length-Value(或Tag-Length-Value),原因我认为很明显。与Ruby中的许多此类任务一样,String#unpack
非常合适:
def decode(data)
return {} if data.empty?
key, len, rest = data.unpack("a2 a2 a*")
val = rest.slice!(0, len.to_i)
{ key => val }.merge(decode(rest))
end
p decode("HD040008000415350110XXXXXXXXXX0208XXXXXXXX0302EN0403USA0502EN0604000107014")
# => {"HD"=>"0008", "00"=>"1535", "01"=>"XXXXXXXXXX", "02"=>"XXXXXXXX", "03"=>"EN", "04"=>"USA", "05"=>"EN", "06"=>"0001", "07"=>"4"}
p decode("EM04000800030010112TME001205IQ50232Blue Point Coastal Cuisine. INC.0614565 5th Avenue0805921010909SAN DIEGO1008Downtown1102CA1203USA")
# => {"EM"=>"0008", "00"=>"001", "01"=>"TME001205IQ5", "02"=>"Blue Point Coastal Cuisine. INC.", "06"=>"565 5th Avenue", "08"=>"92101", "09"=>"SAN DIEGO", "10"=>"Downtown", "11"=>"CA", "12"=>"USA"}
如果你想读取整个文件并返回一个JSON对象数组,那么这样就足够了:
#!/usr/bin/env ruby -n
BEGIN {
require "json"
def decode(data)
# ...
end
arr = []
}
arr << decode($_.chomp)
END { puts arr.to_json }
然后(假设脚本被称为script.rb
并且是可执行的:
$ cat data.txt | ./script.rb > out.json
答案 1 :(得分:1)
假设密钥有2个字符,长度为2个数字:
line = "EM04000800030010112TME001205IQ50232Blue Point Coastal Cuisine. INC.06145655th Avenue0805921010909SAN DIEGO1008Downtown1102CA1203USA"
hsh = {}
arr = line.chars
until arr.empty?
key = arr.shift(2).join
length = arr.shift(2).join.to_i
value = arr.shift(length).join
hsh[key] = value
end
hsh
=> {"EM"=>"0008", "00"=>"001", "01"=>"TME001205IQ5", "02"=>"Blue Point Coastal Cuisine. INC.", "06"=>"5655th Avenue0", "80"=>"21010909SAN DIEGO1008Downtown1102CA1203USA"}
结果看起来有点时髦。
编辑 - 要按照以下步骤逐步浏览文件:
File.open(filename).each_line do |line|
do stuff with line here
end