Question

我有一个太大的XML文件。为了使它更小，我想用相同的更短版本替换所有标签和属性名称。

所以，我实现了这个：

string.gsub!(/<(\w+) /) do |match|
    case match
    when 'Image' then 'Img'
    when 'Text'  then 'Txt'
    end
end

puts string

删除所有开始标记，但没有做太多其他操作。

我在这里做错了什么？

Answer 1

这是另一种方式：

class String
  def minimize_tags!
    {"image" => "img", "text" => "txt"}.each do |from,to|
      gsub!(/<#{from}\b/i,"<#{to}")
      gsub!(/<\/#{from}>/i,"<\/#{to}>")
    end
    self
  end
end

这可能会更容易维护，因为替换模式都在一个地方。对于任何大小的字符串，它可能比凯文的方式快很多。我使用此stackoverflow页面本身的HTML源代码作为测试字符串对这两个方法进行了快速测试，我的方式快了大约6倍......

Answer 2

以下是使用解析器的好处，例如Nokogiri：

这使您可以操作选定的标签（节点）及其属性：

require 'nokogiri'

xml = <<EOT
<xml>
  <Image ImagePath="path/to/image">image comment</Image>
  <Text TextFont="courier" TextSize="9">this is the text</Text>
</xml>
EOT

doc = Nokogiri::XML(xml)
doc.search('Image').each do |n| 
  n.name = 'img' 
  n.attributes['ImagePath'].name = 'path'
end
doc.search('Text').each do |n| 
  n.name = 'txt'
  n.attributes['TextFont'].name = 'font'
  n.attributes['TextSize'].name = 'size'
end
print doc.to_xml
# >> <?xml version="1.0"?>
# >> <xml>
# >>   <img path="path/to/image">image comment</img>
# >>   <txt font="courier" size="9">this is the text</txt>
# >> </xml>

如果您需要遍历每个节点，可能要对标记名称进行通用转换，则可以使用doc.search('*').each。这比搜索单个标签要慢，但如果需要更改每个标签，可能会导致代码减少。

使用解析器的好处是，即使XML的布局发生变化也会起作用，因为它不关心空格，即使属性顺序发生变化也会起作用，使代码更加健壮。

Answer 3

试试这个：

string.gsub!(/(<\/?)(\w+)/) do |match|
  tag_mark = $1
  case $2
  when /^image$/i
    "#{tag_mark}Img"
  when /^text$/i
    "#{tag_mark}Txt"
  else
    match
  end
end

如何使用Ruby替换字符串中每个模式的出现？

3 个答案: