我正在使用Nokogiri从网页上获取数据,我的印象是以下会抓取数据并返回数组?相反,我得到了一个引起一些问题的大字符串。
home_team = doc.css(".team-home.teams")
如果我要使用
home_team = doc.css(".team-home.teams").text
我能理解作为字符串返回的数据。我看错了吗?
我甚至尝试过
home_team = doc.css(".team-home.teams").map(&:text)
但似乎还要返回一个字符串呢?如果我在控制台中返回一个数组,它将是数组格式是吗?
如果有人可以在他们的控制台中尝试这个
require 'open-uri'
require 'nokogiri'
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
#home_team = doc.css(".team-home.teams")
puts home_team
并确认两种情况下输出都是字符串,两者之间的区别是什么。莫在那里稍微迷失了
由于
答案 0 :(得分:2)
你正在获得一个数组。只是puts
正在进行to_s
。看看这个:
require 'open-uri'
require 'nokogiri'
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
# home_team = doc.css(".team-home.teams")
puts home_team.class
puts home_team.map(&:strip).inspect
#=> Array
#=> ["Everton", "Aston Villa", "Southampton", "Stoke", "Swansea", "Man Utd", "Sunderland", "Tottenham", "Chelsea", "Wigan", "Sunderland", "Arsenal", "Man City", "Swansea", "West Ham", "Wigan", "Everton", "Aston Villa", "Southampton", "Fulham", "Reading", "Chelsea", "Newcastle", "Norwich", "Stoke", "West Brom", "Liverpool", "Tottenham", "QPR", "Man Utd", "Newcastle", "Arsenal", "Aston Villa", "Everton", "Reading", "Southampton", "Stoke", "Chelsea", "Arsenal", "Fulham", "Norwich", "QPR", "Sunderland", "Swansea", "West Brom", "West Ham", "Tottenham", "Liverpool", "Man Utd", "Man City", "Aston Villa", "Chelsea", "Everton", "Southampton", "Stoke", "Wigan", "Newcastle", "Reading", "Arsenal", "Fulham", "Liverpool", "Man Utd", "Norwich", "QPR", "Sunderland", "Swansea", "Tottenham", "West Brom", "West Ham", "Arsenal", "Aston Villa", "Everton", "Fulham", "Man Utd", "Norwich", "QPR", "Reading", "Stoke", "Sunderland", "Chelsea", "Liverpool", "Man City", "Newcastle", "Southampton", "Swansea", "Tottenham", "West Brom", "West Ham", "Wigan"]
答案 1 :(得分:1)
数据中有很多空白区域。我这样做时得到一个数组:
home_team = doc.css(".team-home.teams").map {|team| team.text.strip}