Question

我写了一个小脚本来从Github中提取用户名。我可以获得第一个用户名的详细信息，但我不明白如何使用相同的CSS选择器类迭代元素列表以将用户名列表放在一起：

page = agent.get('https://github.com/angular/angular/stargazers')

html_results = Nokogiri::HTML(page.body)

first_username = html_results.at_css('.follow-list-name').text

first_username_location = html_results.at_css('.follow-list-info').text

你能帮助我理解如何迭代follow-list-...中的所有page.body元素并将值存储在某个数组中吗？

Answer 1

Nokogiri at_css返回单（第一）匹配。请使用css来获得匹配结果的数组：

require 'nokogiri'
require 'open-uri'
require 'pp'

html = Nokogiri::HTML(open('https://github.com/angular/angular/stargazers').read)

usernames = html.css('.follow-list-name').map(&:text)
locations = html.css('.follow-list-info').map(&:text)

pp usernames
pp locations

输出：

["Jeff Arese Vilar",
 "Yaroslav Dusaniuk",
 "Matthieu Le brazidec",
  ... ]

[" @Wallapop ",
 " Ukraine, Vinnytsia",
 " Joined on Jul 4, 2014",
 ... ]

请注意，要解析其余成员，您需要处理分页。即使用以下命令从所有其他页面获取数据：

http://github.com/.../stargazers?page=NN

...其中NN是页码。

使用Github API

更强大的方法是使用Github Stargazers List API： https://developer.github.com/v3/activity/starring/#list-stargazers

如何使用Nokogiri遍历li标签并收集它们的值

1 个答案: