Question

我有一个简单的HTML文档：

<div should-not-be-replaced=":smile:">
  Hello :smile:!
</div>

如何将:smile:文字替换为<img src="smile.png">，但保持第一个:smile:不变，以获取此信息：

<div should-not-be-replaced=":smile:">
  Hello <img src="smile.png">!
</div>

我试过这个，但是Nokogiri以纯文本的形式逃脱了我的HTML：

doc = Nokogiri::HTML::DocumentFragment.parse(html)
doc.traverse do |x|
  next unless x.text?
  x.content = x.text.gsub(':smile:', '<img src="smile.png">')
end

Answer 1

我认为这可能是你想要的，它也处理两个冒号之间的字符串，例如：something：and produce＆＃34; something.png＆＃34;同样。

doc = Nokogiri::HTML::DocumentFragment.parse(html)
doc.traverse do |x|
  if x.text? && x.content =~ /:\w+:/
    x.content = x.content.sub(/:(\w+):/, '')
    a = Nokogiri::HTML::DocumentFragment.parse('<a src="'+$1+'.png">')
    x.add_next_sibling(a)
  end
end

Answer 2

我的解决方案与Ku非常相似，尽管我已经尝试通过用HTML Doc Fragment完全替换内容文本来处理被替换文本可能多次出现在源文本中的情况

doc = Nokogiri::HTML::DocumentFragment.parse(DATA.read)
doc.traverse do |x|
  next unless x.text?
  if x.text.match(%r{:(\w+):})
    replace_text = x.text.gsub(%r{:(\w+):}, "<img src='#{$1}.png'>")
    x.content = ""
    x.add_next_sibling replace_text
  end
end

Answer 3

你做得太难了，使用traverse这很慢，因为它迫使Nokogiri走过文件中的每个节点;在一个昂贵的大页面中。

而是利用选择器来查找所需的特定节点：

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<div parm=":smile:">
  Hello :smile:!
</div>
EOT

div = doc.at('div[parm=":smile:"]') 
div.inner_html = div.text.sub(/:smile:/, '<img src="smile.png">')
puts doc.to_html

运行结果：

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<div parm=":smile:">
  Hello <img src="smile.png">!
</div>
</body></html>

我正在使用at，它会找到第一个匹配项。如果您需要处理多个，请使用search。 search返回一个NodeSet，它就像一个数组，因此您需要迭代它。这是在Stack Overflow和其他地方这样做的无数例子。

Answer 4

您的意思是它会返回&lt或&gt吗？

我建议包装CGI#unescape_html方法

试，

require 'cgi'
CGI::unescape_html(doc.to_s)

从以下图像创建HTML链接：冒号：使用Ruby

4 个答案: