Question

我的HTML代码是：

<h3>Head1</h3>
<p>text before link<a href="http://www.google.com" title="http://www.google.com"    target="_blank">Link 1</a>text after link</p>
<h3>Head2</h3>
<p>text before link<a href="http://www.google.com" title="http://www.google.com" target="_blank">Link 2</a>text after link</p>
<h3>Head3</h3>
<p>text before link<a href="http://www.google.com" title="http://www.google.com" target="_blank">Link 3</a>text after link</p>

我正在使用NOKOGIRI进行HTML解析。在上面的情况下，假设上面的html代码在@text

中

@page_data = Nokogiri::HTML(@text)
@headings = @page_data.css('h3')
@desc = @page_data.css('p')

但是在@desc中，它只返回文本，它不会创建“链接1”，“链接2”，“链接3”的链接。

Becoz链接存在于文本之间，因此我无法再将其单独链接在这种情况下，如何通过“p”标签中的链接实现文本？

Answer 1

你的问题不是很清楚你想要完成什么。如果这样......

在这种情况下，如何使用“p”标签中的链接来实现文本？

...你的意思是，“如何获取每个<p>代码的HTML内容？”然后就可以了：

require "nokogiri"
frag = Nokogiri::HTML.fragment(my_html)
frag.css('h3').each do |header|
  puts header.text
  para = header.next_element
  puts para.inner_html
end
#=> Head1
#=> text before link<a href="http://www.google.com" title="http://www.google.com" target="_blank">Link 1</a>text after link
#=> Head2
#=> text before link<a href="http://www.google.com" title="http://www.google.com" target="_blank">Link 2</a>text after link
#=> Head3
#=> text before link<a href="http://www.google.com" title="http://www.google.com" target="_blank">Link 3</a>text after link

如果您的意思是“如何在每个段落中获取锚点的文本？”那么您可以这样做：

frag.css('h3').each do |header|
  anchor = header.next_element.at_css('a')
  puts "#{header.text}: #{anchor.text}"
end
#=> Head1: Link 1
#=> Head2: Link 2
#=> Head3: Link 3

......或者你可以这样做：

frag.xpath('.//p/a').each do |anchor|
  puts anchor.text
end
#=> Link 1
#=> Link 2
#=> Link 3

如果这些都不是你想要的，那么请编辑你的问题，以便更清楚地解释你想要的最终结果。

Nokogiri：无法使用“a”标记链接访问“p”标记文本

1 个答案: