我希望在网页中搜索包含'small business'
的句子,并对页面上的每个链接执行相同操作,包括三层或四层深度。
我的尝试是这样的:
def get_sentences
sentences = []
doc = Nokogiri::HTML(open("http://www.brampton.ca/EN/Business/Pages/top-links.aspx"))
@sentences = doc.search(/[^.]*small business[^.]*\./i)
links = doc.search('a[href]').select{ |n| n['href'][/\.html$/] }.map{ |n| n['href'] })
doc1 = links.each { |x| Nokogiri::HTML(open(x)) }
@sentences << doc1.search(/[^.]*small business[^.]*\./ig)
links1 = links.each { |x| x.search('a[href]').select{ |n| n['href'][/\.html$/] }.map{ |n| n['href'] }
doc2 = links1.each { |x| Nokogiri::HTML(open(x)) }
@sentences << doc2.search(/[^.]*small business[^.]*\./ig)
links2 = links1.each { |x| x.search('a[href]').select{ |n| n['href'][/\.html$/] }.map{ |n| n['href'] }
doc3 = links2.each { |x| Nokogiri::HTML(open(x)) }
@sentences << doc3.search(/[^.]*small business[^.]*\./ig)
end
edit, narrowed it down to this lol
@sentences = []
doc = Nokogiri::HTML(open("https://en.wikipedia.org/wiki/Small_business"))
regex = /[^.]*small business[^.]*\./i
a = doc.traverse { |x|
if x.text =~ regex
@sentences << x
end
但是我可能会在一个月之后离开我的联盟。
..........工作!