Question

我已经开发了这个class Directory，其中一些是使用哈希模拟目录的。我很难弄清楚如何执行serialize和parse方法。 string方法返回的serialize应如下所示：

2:README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:

现在解释这究竟是什么意思。这是主目录，2预先表示文件数，而不是文件名README，之后是文件19的内容长度，表示为我从模块中另一个类的parse方法获得的字符串。在那之后的第二个文件中，还注意到这两个文件没有被:分开，我们不需要它，因为知道字符串长度。所以稍微好一点：

<file count><file1_data><file2_data>1:rbfs:4:0:0:，此处<file1_data>包含姓名，长度和内容部分。

现在1:rbfs:4:0:0:表示我们有一个名称为rbfs的子目录，4表示其长度为字符串，0:0:代表它是空的，没有文件，也没有子目录。这是另一个例子：

0:1:directory1:40:0:1:directory2:22:1:README:9:number:420:相当于：

.
`-- directory1
    `-- directory2
        `-- README

我对文件部分没有任何问题，我知道如何获取目录及其名称的数量，但另一部分我不知道该怎么做。我知道recursion是最好的答案，但我不知道该递归的底部应该是什么以及如何实现它。解决这个问题将有助于找出如何通过逆向工程来实现parse方法。

代码如下：

module RBFS
class File
  ... # here I have working `serialize` and `parse` methods for `File`
end

class Directory
attr_accessor :content
def initialize
  @content = {}
end

def add_file (name,file)
  @content[name]=file
end

def add_directory(name, subdirectory = nil)
  if subdirectory
    @content[name] = subdirectory
  else
    @content[name] = RBFS::Directory.new
  end
end

def serialize
  ...?
end

def self.parse (string)
  ...?
end
end

end

PS：我使用is_a?方法检查散列中的值类型。

@Jordan的另一个例子：

2:file1:17:string:Test test?file2:10:number:4322:direc1:34:0:1:dir2:22:1:README:9:number:420:direc2::1:README2:9:number:33:0

......应该是这种结构（如果我已经正确地制定了它）：

. ->file1,file2
`-- direc1,.....................................direc2 -> README2
    `-- dir2(subdirectory of direc1) -> README

direc1仅包含目录而不包含文件，而direc2仅包含文件。您可以看到主目录没有指定它的字符串长度，而其他所有目录都没有。

Answer 1

好的，让我们从你的例子开始迭代地完成这个过程：

str = "2:README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
entries = {} # No entries yet!

我们需要知道的第一件事是有多少文件，我们知道我们知道第一个:之前的数字：

num_entries, rest = str.split(':', 2)
num_entries = Integer(num_entries)
# num_entries is now 2
# rest is now "README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"

split的第二个参数说“我只需要2件”，因此它会在第一个:之后停止分割。）我们使用Integer(n)代替n.to_i，因为它更严格。（to_i会将"10xyz"转换为10; Integer会引发错误，这就是我们想要的错误。）

现在我们知道我们有两个文件。我们还不知道其他什么，但我们的字符串还剩下这个：

README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:

我们接下来可以得到的是第一个文件的名称和长度。

name, len, rest = rest.split(':', 3)
len = Integer(len.to_i)
# name = "README"
# len  = 19
# rest = "string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"

很酷，现在我们有第一个文件的名称和长度，所以我们可以得到它的内容：

content = rest.slice!(0, len)
# content = "string:Hello world!"
# rest = "spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
entries[name] = content
# entries = { "README" => "string:Hello world!" }

我们使用rest.slice!修改了字符串前面的len个字符并将其返回，因此content正是我们想要的（string:Hello world!）和{{ 1}}就是它之后的一切。然后我们将它添加到rest哈希。一个文件下来，一个要去！

对于第二个文件，我们做同样的事情：

entries

既然我们做了两次完全相同的事情，显然我们应该在循环中做到这一点！但在我们写这篇文章之前，我们需要组织起来。到目前为止，我们有两个不连续的步骤：首先，获取文件数量。其次，获取这些文件的内容。我们也知道我们需要获取目录和目录的数量。我们会猜测这个看起来如何：

name, len, rest = rest.split(':', 3)
len = Integer(len)
# name = "spec.rb"
# len = 20
# rest = "string:describe RBFS1:rbfs:4:0:0:"

content = rest.slice!(0, len)
# content = "string:describe RBFS"
# rest  = "1:rbfs:4:0:0:"
entries[name] = content
# entries = { "README" => "string:Hello world!",
#             "spec.rb" => "string:describe RBFS" }

虽然def parse(serialized) files, rest = parse_files(serialized) # `files` will be a Hash of file names and their contents and `rest` will be # the part of the string we haven't serialized yet directories, rest = parse_directories(rest) # `directories` will be a Hash of directory names and their contents files.merge(directories) end def parse_files(serialized) # Get the number of files from the beginning of the string num_entries, rest = str.split(':', 2) num_entries = Integer(num_entries) entries = {} # `rest` now starts with the first file (e.g. "README:19:...") num_entries.times do name, len, rest = rest.split(':', 3) # get the file name and length len = Integer(len) content = rest.slice!(0, len) # get the file contents from the beginning of the string entries[name] = content # add it to the hash end [ entries, rest ] end def parse_directories(serialized) # TBD... end方法对我来说有点长，但是我们如何拆分呢？

parse_files

清洁！

现在，我将给你一个大的破坏者：由于序列化格式设计合理，我们实际上并不需要def parse_files(serialized) # Get the number of files from the beginning of the string num_entries, rest = str.split(':', 2) num_entries = Integer(num_entries) entries = {} # `rest` now starts with the first file (e.g. "README:19:...") num_entries.times do name, content, rest = parse_file(rest) entries[name] = content # add it to the hash end [ entries, rest ] end def parse_file(serialized) name, len, rest = serialized.split(':', 3) # get the name and length of the file len = Integer(len) content = rest.slice!(0, len) # use the length to get its contents [ name, content, rest ] end方法，因为它与{ {1}}。仅的区别在于此行之后：

parse_directories

...如果我们要分析目录而不是文件，我们想做一些不同的事情。特别是，我们想要调用parse_files，它将在目录的内容上再次执行所有这些操作。既然它现在正在拉动双重任务，我们应该将它的名称更改为像name, content, rest = parse_file(rest)这样更通用的东西，我们还需要给它另一个参数来告诉它何时进行递归。

我没有在此处发布更多代码，而是发布了我的“已完成”产品over in this Gist。

现在，我知道这对parse(content)部分没有帮助，但希望它能帮助您入门。 parse_entries是更容易的部分，因为有很多关于递归迭代Hash的问题和答案。

Ruby中的自定义序列化和解析方法

1 个答案: