我正在尝试抓取我最喜欢的食物的详细信息,我能够获得名称和价格,但是由于文本嵌套在一个跨度中,因此描述被证明是一个挑战。我已尝试将其组合使用,但似乎没有给出描述。
任何帮助将不胜感激:
menu = doc.css('.menu-index-page__item-content').map do |menu|
meal_name = menu.at_css('.menu-index-page__item-title span[1]').text.strip
meal_price = menu.at_css('.menu-index-page__item-price').text.strip
meal_des = menu.css('p.menu-index-page__item-desc span[3]').text.strip.to_s
Event1.new食品名称, 餐食价格 餐 结束 pp菜单
返回
#<struct Event1
meal_name="chicken Burger",
meal_price="£3.95",
meal_des="">,
我要抓取的HTML如下:
<div class="menu-index-page__item-content" data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent">
<h6 class="menu-index-page__item-title" data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.0"><span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.0.1">Chicken Burger</span></h6>
<p class="menu-index-page__item-desc" data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1"><span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0">
<span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0"><span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.0"><span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.0.0:$0">Chargrilled chicken thigh with</span>
<br data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.0.0:$0br"><span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.0.$1">fresh herb olive oil mayonnaise.</span></span>
<span style="position:fixed;visibility:hidden;top:0;left:0;" data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.2">…</span></span></p>
<span class="menu-index-page__item-price" data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.2">£3.55</span>
我正在尝试 炭烧鸡大腿 新鲜的香草橄榄油蛋黄酱。
作为描述,我不确定为什么span [3]没有给我任何结果。
答案 0 :(得分:0)
对nokogiri
知之甚少,我不知道执行此操作的最佳方法,但这就是我从给定HTML中提取文本的方式:
# I assume you can get this HTML by doing `menu.css('p.menu-index-page__item-desc').something`
desc_html = '<span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0">
<span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0">
<span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.0">
<span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.0.0:$0">Chargrilled chicken thigh with</span>
<br data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.0.0:$0br">
<span data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.0.$1">fresh herb olive oil mayonnaise.</span>
</span>
<span style="position:fixed;visibility:hidden;top:0;left:0;" data-reactid=".1446l8bfnk0.3.5.0.4:$648324.2.$4885079.$menuItemContent.1.0.2"></span>
</span>
</span>'
doc = Nokogiri.parse(desc_html)
doc.children.text
=> "\n \n \n Chargrilled chicken thigh with\n \n fresh herb olive oil mayonnaise.\n \n \n \n \n"
doc.children.text.strip
=> "Chargrilled chicken thigh with\n \n fresh herb olive oil mayonnaise."
doc.children.text.strip.gsub(/\W{2,}/, ' ')
=> "Chargrilled chicken thigh with fresh herb olive oil mayonnaise."
答案 1 :(得分:0)
这对我有用:
# ...
menu = doc.css('.menu-index-page__item-content')
menu.map { |m| m.css('span')[2].text }
# => ["Chargrilled chicken thigh with\nfresh herb olive oil mayonnaise.\n…"]