Heroku抛出OpenURI :: HTTPError(403 Forbidden):

时间:2015-04-27 04:27:53

标签: ruby-on-rails ruby heroku http-status-code-403 open-uri

我有一个Rails 4应用程序向BackboneJS前端客户端提供JSON数据。后端从Craigslist中删除了一些内容,并将其作为JSON提供给前端。在本地,在开发中,它的工作方式与预期一致。

在Heroku上,应用程序布局确实得到了正确的服务,资产似乎加载得很好。直到backboneJS查询数据才能填充相应的视图,并且应用程序因OpenURI错误而失败。

更具体地说,OpenURI不断返回

OpenURI::HTTPError (403 Forbidden) 

在Rails控制器中执行以下行时:

open( "#{clist.url}" )

我花了几个小时尝试了我在Stack Overflow和Github上找到的各种'已解决'解决方案,并且只是试图查看Heroku日志中的其他错误,但错误仍然无论我尝试哪种“解决方案”。

截至目前,我已经尝试了以下建议的解决方案,以及其他几个愚蠢的解决方案:

  • 我的应用程序控制器中必需的open-uri
  • 在我的open方法调用中添加了“User-Agent”键
  • 将clist.url从“https”更改为“http”以避免重定向

此外,该应用程序是直接的,目前也不需要身份验证。

除了Stack Overflow和github之外,我在其他地方找不到任何建议的解决方案。任何有关尝试或建议的解决方案的其他调试技巧的建议的任何帮助将非常感激。我对Heroku相当新,所以仍然熟悉远程生产引擎上的调试问题。

以下是从Craigslist获取的相关代码(不要过于严厉地判断我。所有/大部分此方法都被设置为重构并放入其自己的类/模型中,这是它所属的地方):

def index
    @listings = []

    #Retrieve job listings from Craigslist (see method sync_list below ... )
    @raw_listings = sync_clist

    # filters applied at this point ...
    # and they transform @raw_listings to ...
    # the array @listings

    render json: @listings
end

def sync_clist
    #@search_items = SearchItem.all
    @search_items = SearchItem.all[0..1]
    site = @sites[0]

    # in the Craigslist HTML, the second element in the returned job listing # is the better one to use
    href_idx = 1

    ##### LINE 123 is the next one:
    @search_items.each_with_index do |search_item, idx|
      puts "#{search_item.url.upcase}"

      ##### LINE 125 ... FAILURE_HERE? ***************** 
      html = open( "#{search_item.url}", 'User-Agent' => "Ruby/#{RUBY_VERSION}")
      page = Nokogiri::HTML( html.read, nil, 'utf-8' )
      category_idx = idx % 4
      isNearby_listing = false     #also capture 'nearby' jobs on Craigslist

      page.css( site[:joblist_css] )[0..-2].each_with_index do |listing, i|
        # convert the relative url in the list to a full-url
        locale_idx = idx/@clist_locales.length
        listing_url = listing.css('a')[href_idx]['href']

        # Craigslist only lists the relative path of job urls - relative to the
        # current search location. The 'More Local' items, however, return the
        # full url.
        if !isNearby_listing
          posting_url = site[:protocol] + site[:locales][locale_idx] + "." + site[:host] + listing_url
        else
          posting_url = listing_url
        end

        # Once the appropriate heading is reached, the 'More Local' listings
        # items begin appearing
        if listing.next_sibling.node_name == 'h4'
          isNearby_listing = true
        end

        posting_date = listing.css('time')[0]['datetime']
        job_listing = { :source       => site[:sitename].upcase,
                        :title        => listing.css('a')[href_idx].text,
                        :url          => posting_url,
                        :listing_id   => listing["data-pid"],
                        :location     => @clist_locales[locale_idx],
                        :content      => "",
                        :telecommute  => "",
                        :contract     => "",
                        :pt_ft        => "",
                        :favorite     => false,
                        :posted_date  => posting_date,
                        :category     => @clist_categories[category_idx],
                        :apply_state  => "new"
        }

        @new_listings << job_listing
      end
    end
    @new_listings
  end

以下是我的Heroku日志的输出,以防它有用:

```

2015-04-27T03:47:14.611278+00:00 app[web.1]: => Rails 4.0.8 application starting in production on http://0.0.0.0:43688
2015-04-27T03:47:14.611280+00:00 app[web.1]: => Run `rails server -h` for more startup options
2015-04-27T03:47:14.611310+00:00 app[web.1]: Started GET "/" for 24.5.106.52 at 2015-04-27 03:47:14 +0000
2015-04-27T03:47:14.611272+00:00 app[web.1]: => Booting WEBrick
2015-04-27T03:47:14.611281+00:00 app[web.1]: => Ctrl-C to shutdown server
2015-04-27T03:47:14.664777+00:00 app[web.1]:   Rendered app/root.html.erb within layouts/application (0.5ms)
2015-04-27T03:47:14.661735+00:00 app[web.1]: Processing by AppController#root as HTML
2015-04-27T03:47:14.664783+00:00 app[web.1]:   Rendered app/root.html.erb within layouts/application (0.5ms)
2015-04-27T03:47:14.674604+00:00 app[web.1]: Completed 200 OK in 13ms (Views: 12.3ms | ActiveRecord: 0.0ms)
2015-04-27T03:47:14.674612+00:00 app[web.1]: Completed 200 OK in 13ms (Views: 12.3ms | ActiveRecord: 0.0ms)
2015-04-27T03:47:14.611302+00:00 app[web.1]: Started GET "/" for 24.5.106.52 at 2015-04-27 03:47:14 +0000
2015-04-27T03:47:14.661749+00:00 app[web.1]: Processing by AppController#root as HTML
2015-04-27T03:47:15.943141+00:00 heroku[router]: at=info method=GET path="/assets/application-45c34fbd86efe641e061caa3b34737d7.css" host=APPNAME.herokuapp.com request_id=dbf8473e-2880-42f6-8ef4-11e0da7141b4 fwd="24.5.106.52" dyno=web.1 connect=2ms service=73ms status=200 bytes=569692
2015-04-27T03:47:15.944069+00:00 heroku[router]: at=info method=GET path="/assets/application-537f60efd0378faaddaea08875f25055.js" host=APPNAME.herokuapp.com request_id=5c7cf6db-393f-434e-a494-d3664d31a20f fwd="24.5.106.52" dyno=web.1 connect=2ms service=70ms status=200 bytes=965489
2015-04-27T03:47:17.631614+00:00 app[web.1]: Started GET "/posts" for 24.5.106.52 at 2015-04-27 03:47:17 +0000
2015-04-27T03:47:17.631624+00:00 app[web.1]: Started GET "/posts" for 24.5.106.52 at 2015-04-27 03:47:17 +0000
2015-04-27T03:47:17.636842+00:00 app[web.1]: Processing by PostsController#index as JSON
2015-04-27T03:47:17.636831+00:00 app[web.1]: Processing by PostsController#index as JSON
2015-04-27T03:47:17.665568+00:00 app[web.1]: HTTP://SFBAY.CRAIGSLIST.ORG/SEARCH/SOF?QUERY=RAILS
2015-04-27T03:47:18.048174+00:00 heroku[router]: at=info method=GET path="/posts" host=APPNAME.herokuapp.com request_id=a2724fb9-6872-4435-8dfc-0df1b73fb761 fwd="24.5.106.52" dyno=web.1 connect=2ms service=417ms status=500 bytes=330
2015-04-27T03:47:18.042116+00:00 app[web.1]: Completed 500 Internal Server Error in 405ms
2015-04-27T03:47:18.043586+00:00 app[web.1]: OpenURI::HTTPError (403 Forbidden):
2015-04-27T03:47:18.042129+00:00 app[web.1]: Completed 500 Internal Server Error in 405ms
2015-04-27T03:47:18.043590+00:00 app[web.1]:   app/controllers/posts_controller.rb:123:in `each'
2015-04-27T03:47:18.043588+00:00 app[web.1]:   app/controllers/posts_controller.rb:125:in `block in sync_clist'
2015-04-27T03:47:18.043593+00:00 app[web.1]:   app/controllers/posts_controller.rb:123:in `sync_clist'
2015-04-27T03:47:18.043584+00:00 app[web.1]:
2015-04-27T03:47:18.043591+00:00 app[web.1]:   app/controllers/posts_controller.rb:123:in `each_with_index'
2015-04-27T03:47:18.043596+00:00 app[web.1]:
2015-04-27T03:47:18.043594+00:00 app[web.1]:   app/controllers/posts_controller.rb:53:in `index'
2015-04-27T03:47:18.043597+00:00 app[web.1]:
2015-04-27T03:47:18.043602+00:00 app[web.1]:
2015-04-27T03:47:18.043604+00:00 app[web.1]:   app/controllers/posts_controller.rb:125:in `block in sync_clist'
2015-04-27T03:47:18.043603+00:00 app[web.1]: OpenURI::HTTPError (403 Forbidden):
2015-04-27T03:47:18.043606+00:00 app[web.1]:   app/controllers/posts_controller.rb:123:in `each'
2015-04-27T03:47:18.043607+00:00 app[web.1]:   app/controllers/posts_controller.rb:123:in `each_with_index'
2015-04-27T03:47:18.043610+00:00 app[web.1]:   app/controllers/posts_controller.rb:53:in `index'
2015-04-27T03:47:18.043609+00:00 app[web.1]:   app/controllers/posts_controller.rb:123:in `sync_clist'
2015-04-27T03:47:18.043613+00:00 app[web.1]:
2015-04-27T03:47:18.043611+00:00 app[web.1]:

```

2 个答案:

答案 0 :(得分:1)

在Heroku控制台中提出@TarynEast的建议和运行应用程序代码后,我转而使用'net / http'库而不是'open-uri'来检索Craigslist网页。使用'net / http'时,Craigslist返回以下消息:

"This IP has been automatically blocked.\nIf you have questions, please email: blocks-b1402369961264436@craigslist.org\n"

所以,显然,要么所有的Heroku IP都被阻止了,或者更有可能的是,它只是我的应用程序专门被阻止了,即使我的应用程序每次站点加载最多8次ping craigslist。也许这足以阻止它,因为Craigslist是一个非常受欢迎的抓取应用程序的目标。无论如何,神秘解决了为什么403错误。至少应用程序仍可在本地使用。

[更新:]根据谷歌的快速搜索,Craigslist阻止整个AWS和Heroku IP等。请参阅此处的问题:Craigslist blocking Heroku/AWS

答案 1 :(得分:0)

当我在Heroku上托管一个Rails应用程序时,我遇到了类似的问题。我得到了同样的错误,因为URL错误。