require 'open-uri'
require 'nokogiri'
def scrap(url)
html = open(url).read
nokogiri_doc = Nokogiri::HTML(html)
final_array = []
nokogiri_doc.search("a").each do |element|
element = element.text
final_array << element
end
final_array.each_with_index do |index|
puts "#{index}"
end
end
scrap('http://www.infranetsol.com/')
在这种情况下,我仅获得a
标记,但是我需要将电子邮件ID和电话号码放入Excel文件中。
答案 0 :(得分:0)
您所拥有的只是文字。因此,您可以做的是仅使字符串看起来像电子邮件或电话号码。
对象实例,如果将结果保存在数组中
a = scrap('http://www.infranetsol.com/')
您可以通过电子邮件获取元素(带有'@'的字符串):
a.select { |s| s.match(/.*@.*/) }
您可以获得带电话号码的元素(至少5位数字的字符串):
a.select{ |s| s.match(/\d{5}/) }
整个代码:
require 'open-uri'
require 'nokogiri'
def scrap(url)
html = open(url).read
nokogiri_doc = Nokogiri::HTML(html)
final_array = []
nokogiri_doc.search("a").each do |element|
element = element.text
final_array << element
end
final_array.each_with_index do |index|
puts "#{index}"
end
end
a = scrap('http://www.infranetsol.com/')
email = a.select { |s| s.match(/.*@.*/) }
phone = a.select{ |s| s.match(/\d{5}/) }
# in your example, you will have to email in email
# and unfortunately a complex string for phone.
# you can use scan to extract phone from text and flat_map
# to get an array without sub array
# But keep in mind it will only worked with this text
phone.flat_map{ |elt| elt.scan(/\d[\d ]*/) }