Question

我是Ruby的新手，我目前正在使用Nokogiri进行网站搜索练习。我想从一个随机的团购网站上搜集“优惠”中的细节。我已经能够成功刮取一个网站，但我在解析输出时遇到问题。我尝试了here中建议的解决方案，也使用了正则表达式。到目前为止，我失败了。

我正在尝试从this页面解析以下标题/说明：

Frosty Frappes starting at P100 for P200 worth at Café Tavolo – up to 55% off

这就是我得到的：

FrostyFrappes starting at P100 for P200 worth at Caf Tavolo  up to 55% off

以下是我的代码中的代码段：

require 'uri'
require 'nokogiri'

html = open(url)
doc = Nokogiri::HTML(html.read)
doc.encoding = "utf-8"
title = doc.at_xpath('/html/body/div/div[9]/div[2]/div/div/div/h1/a')
puts title.content.to_s.strip.gsub(/[^0-9a-z%&!\n\/(). ]/i, '')

如果我错过了什么，请告诉我。谢谢。

Answer 1

你的xpath过于僵硬，你的正则表达式正在删除你想要保留的字符。我就是这样做的：

title = doc.at('div#contentDealTitle h1 a').text.strip.gsub(/\s+/,' ')

那就是从div＃contentDealTitle和h1之后的第一个标签中取出文本，剥去它（删除前导和尾随空格）并用一个空格替换1个或多个空白字符的所有序列。

使用Nokogiri刮取字符串时会删除一些空格

1 个答案: