我根据最后一场比赛的结果条件创建一个小应用程序,或者根据游戏数据的最后一行创建一个小应用程序(赢/输和游戏编号)。
我的问题是访问最后一行的第一列(最近玩过的游戏)。这是如何完成的?
require 'open-uri'
class BrooklynPizzaController < ApplicationController
def index
# URL for dynamic content
url = "http://www.basketball-reference.com/teams/BRK/2015_games.html"
# Open URL using nokogiri
doc = Nokogiri::HTML(open(url))
# Scrape result from Web site
@result = doc.css("#teams_games").xpath("//table/tbody/tr/td[8]/text()")
# IN PROGRESS - Get date of last game played
@result_date = doc.xpath('//table/tbody/tr/td[2]/a/text()') do |link|
@result_date[link.text.strip] = link['a']
end
###############################################################
# IN PROGRESS - Get number of last game played from 1st column
# doc.xpath('//table/tbody/tr/td[1]/text()') do |game|
# last_game_number =
# end
################################################################
# @result_date = doc.css("#teams_games").xpath("//table/tbody/tr/td[2]/text()")
# Set date to current
@date = Date.today
# Get date of last game played
if (@result.last.next == nil)
flag = doc.xpath("//table/tbody/tr[#{@result}]")
@result_date = doc.xpath("//table/tbody/tr#{flag}/td[2]/a/text()")
end
end
end
请让我知道我给你的信息缺乏,因为我觉得我遗漏了一些东西。
答案 0 :(得分:1)
要获得该行,您可以执行此操作:
win_loss_tds = doc.css("#teams_games tbody tr td:nth-child(8):not(:empty)").last
last_win_loss_row = win_loss_tds.last.parent
毫无疑问,在单个XPath表达式中有一种方法可以做到这一点,但是我将这作为练习留给读者,因为我不关心XPath。
要从第一列获取游戏编号,您可以执行此操作:
game_num_col = last_win_loss_row.at("td:first-child")
game_num = game_num_col.text.to_i
# => 82
要从第二栏获取日期:
date_col = last_win_loss_row.at("td:nth-child(2)") # XPath: td[2]
date = DateTime.parse(date_col.text)
# => 2015-04-15T00:00:00+00:00
如果您想要日期和时间,可以这样做:
time_col = last_win_loss_row.at("td:nth-child(3)")
date_time = DateTime.parse("#{date_col.text} #{time_col.text}")
# => 2015-04-15T08:00:00-03:00
答案 1 :(得分:1)
好吧,我这样做:
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open("http://www.basketball-reference.com/teams/BRK/2015_games.html"))
latest_score_row = doc.search('//tr/td/a[contains(.,"Box Score")]/../..').last
latest_text = latest_score_row.search('td').map(&:text)
# => ["13",
# "Sat, Nov 22, 2014",
# "8:30p EST",
# "",
# "Box Score",
# "@",
# "San Antonio Spurs",
# "L",
# "",
# "87",
# "99",
# "5",
# "8",
# "L 1",
# ""]
但是YMMV。
它是如何工作的?简单。它在包含&#34; Box Score&#34;的页面中查找<a>
个节点,然后,对于找到的每个节点,将两个级别备份到<tr>
节点并将数组返回给Nokogiri / Ruby 。 last
找到最后一个。
然后,只需查看<td>
个节点的行并抓取其文本即可。
时间戳是从阵列中拉出日期和时间的问题,然后对&#34; am / pm&#34;进行一点点按摩。并让Ruby构建一个对象:
latest_time = Time.strptime(
[
latest_text[1], # => "Sat, Nov 22, 2014"
latest_text[2].sub(/([ap])/, '\1m') # => "8:30pm EST"
].join(' '), # => "Sat, Nov 22, 2014 8:30pm EST"
'%a, %b %d, %Y %H:%M%P %Z' # => "%a, %b %d, %Y %H:%M%P %Z"
) # => 2014-11-22 18:30:00 -0700