使用ruby进行网络抓取

时间:2011-07-11 22:05:50

标签: ruby json github

我是编程新手,我有一个项目,我必须编写一个Ruby脚本,从github检索指定存储库的信息,解析JSON格式的数据,并在命令行上以可用的格式打印。

我查看了机械化指南。我可以检查哪些文件以完成此操作?

2 个答案:

答案 0 :(得分:3)

使用Github的Repositories API。你想要的一切都是在那里完成的,没有刮擦或怪异的黑客攻击。默认情况下,JSON格式化响应。

答案 1 :(得分:1)

继续@Douglas的回复。您想要做的是使用GitHub API和HTTParty gem:

require 'httparty'
class Repository
  include HTTParty
  base_uri 'www.github.com'
end
response = Repository.get('/api/v2/json/repos/show/joncooper/beanstalkd')

require 'awesome_print'
>> ap response.parsed_response
{
    "repository" => {
                 "name" => "beanstalkd",
                 "size" => 128,
           "created_at" => "2011/04/29 09:43:43 -0700",
             "has_wiki" => true,
               "parent" => "kr/beanstalkd",
              "private" => false,
             "watchers" => 1,
                 "fork" => true,
             "language" => "C",
                  "url" => "https://github.com/joncooper/beanstalkd",
            "pushed_at" => "2011/07/05 22:10:53 -0700",
          "open_issues" => 0,
        "has_downloads" => true,
           "has_issues" => false,
             "homepage" => "http://kr.github.com/beanstalkd/",
                "forks" => 0,
          "description" => "Beanstalk is a simple, fast work queue.",
               "source" => "kr/beanstalkd",
                "owner" => "joncooper"
    }
}

有关详情,请参阅http://httparty.rubyforge.org/