我有以下代码:
#/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'cora'
require 'eat'
#require 'timeout'
doc = Nokogiri::HTML(open("http://mobile.bahn.de/bin/mobil/bhftafel.exe/dox?input=Richard-Strauss-Stra%DFe%2C+M%FCnchen%23625127&date=27.01.12&time=20%3A41&productsFilter=1111111111000000&REQTrain_name=&maxJourneys=10&start=Suchen&boardType=Abfahrt&ao=yes"))
doc = doc.xpath('//div').each do |node|
puts node.content
end
如何删除p标签和空格?
答案 0 :(得分:1)
这是对你可能想要的东西的猜测:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://mobile.bahn.de/bin/mobil/bhftafel.exe/dox?input=Richard-Strauss-Stra%DFe%2C+M%FCnchen%23625127&date=27.01.12&time=20%3A41&productsFilter=1111111111000000&REQTrain_name=&maxJourneys=10&start=Suchen&boardType=Abfahrt&ao=yes"))
doc.xpath('//div//p').remove
doc = doc.xpath('//div').each do |node|
text = node.text.gsub(/\n([ \t]*\n)+/,"\n").gsub(/^\s+|\s+$/,'')
puts text unless text.empty?
end
这将从文档中删除所有<p>
元素,然后从文本中删除所有空行和前导和尾随空格。最后,如果结果为空字符串,则不会打印文本。
编辑:要为日期创建变量,请将上面的内容包装在函数中并使用字符串插值来构造您的URL。例如:
require 'nokogiri'
require 'open-uri'
def get_data( date )
date_string = date.strftime('%d-%m-%y')
url = "http://mobilde.bahn.de/…more…#{date_string}…more…"
doc = Nokogiri::HTML(open(url))
# more code from above
end