我已经从网络刮取了这个阵列。看起来像这样:
[["formatted_sum_fees", "£5.60"],
["formatted_price", "£46.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£3.30"],
["formatted_price", "£27.50"],
["formatted_sum_fees", "£3.30"],
["formatted_price", "£27.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£5.60"],
["formatted_price", "£46.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£5.60"],
["formatted_price", "£46.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£3.30"],
["formatted_price", "£27.50"]]
那么会发生什么呢?它重复了吗?所以我想将上面的数组更改为此(每次都会有所不同,因此需要删除重复项。):
[["formatted_sum_fees", "£5.60"],
["formatted_price", "£46.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£3.30"],
["formatted_price", "£27.50"]
此后存在的任何其他内容都是骗局。
我需要的是费用和价格,因此我可以将其保存到数据库:)
由于 萨姆
额外 这是raketask。
require "nokogiri"
require "open-uri"
namespace :task do
task test: :environment do
ticketmaster_url = "http://www.ticketmaster.co.uk/derren-brown-miracle-glasgow-04-07-2016/event/370050789149169E?artistid=1408737&majorcatid=10002&minorcatid=53&tpab=-1"
doc = Nokogiri::HTML(open(ticketmaster_url))
event_name = nil
ticket_price = nil
doc.xpath("//script[@type='text/javascript']/text()").each do |text|
if text.content =~ /more_options_on_polling/
ticket_price = text.to_s.scan(/\"(formatted_(?:price|sum_fees))\":\"(.+?)\"/)
byebug
end
end
end
end
答案 0 :(得分:1)
您可以使用
[["formatted_sum_fees", "£5.60"],
["formatted_price", "£46.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£3.30"],
["formatted_price", "£27.50"],
["formatted_sum_fees", "£3.30"],
["formatted_price", "£27.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£5.60"],
["formatted_price", "£46.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£5.60"],
["formatted_price", "£46.50"],
["formatted_sum_fees", "£4.50"],
["formatted_price", "£37.50"],
["formatted_sum_fees", "£3.30"],
["formatted_price", "£27.50"]].uniq
然后结果是:
[["formatted_sum_fees", "£5.60"], ["formatted_price", "£46.50"], ["formatted_sum_fees", "£4.50"], ["formatted_price", "£37.50"], ["formatted_sum_fees", "£3.30"], ["formatted_price", "£27.50"]]
答案 1 :(得分:1)
你只需要在你从web scrape获得的数组上添加uniq
方法,这将从该数组中获得uniq
值,然后你可以轻松地在该数组上迭代到将值存储到数据库中。