Question

我有一串带有断点标签的字符串。

不幸的是，他们是不正常的。

 等......

我正在使用nokogiri，但我不知道该如何告诉它在每个断点标签处分解字符串....

感谢。

Answer 1

如果你可以打破正则表达式，请使用以下分隔符：

<\s*[Bb][Rr]\s*\/*>

说明：

一个左尖括号，零个或多个空格，B或b，R或r，零个或多个空格，零个或多个正斜杠。

要使用正则表达式，请查看此处：
http://www.regular-expressions.info/ruby.html

Answer 2

所以要实现iftrue的响应：

a = 'a<Br>b<BR>c<br/>d<BR/>e<br />f'
a.split(/<\s*[Bb][Rr]\s*\/*>/)
=> ["a", "b", "c", "d", "e", "f"]

...在HTML中断之间留下了一串字符串的数组。

Answer 3

Pesto 99％的方式，但Nokogiri支持创建一个文档片段，不会在声明中包装文本：

 text = Nokogiri::HTML::DocumentFragment.parse('<Br>this<BR>is<br/>a<BR/>text<br />string').children.select {|n| n.text? and n.content } 
puts text
# >> this
# >> is
# >> a
# >> text
# >> string

Answer 4

如果您使用Nokogiri解析字符串，则可以扫描它并忽略除文本元素之外的任何内容：

require 'nokogiri'
doc = Nokogiri::HTML.parse('a<Br>b<BR>c<br/>d<BR/>e<br />f')
text = []
doc.search('p').first.children.each do |node|
  text << node.content if node.text?
end
p text  # => ["a", "b", "c", "d", "e", "f"]

请注意，您必须搜索第一个p标记，因为Nokogiri会将整个内容包装在<!DOCTYPE blah blah><html><body>YOUR TEXT</body></html>中。

如何在一个字符串中爆炸标签？

4 个答案: