在某个网页上的ajax调用中获取请求的页面<url> </url>

时间:2013-12-05 11:48:33

标签: ruby ajax parsing nokogiri

我正在使用Nokogiri,此时,我的变量包含某些页面的代码:doc = Nokogiri::HTML(open(page))。代码中有脚本,ajax调用:

<script type="text/javascript" charset="utf-8">         
      $(document).ready(function(){
        $("#menu").kendoMenu();    
        $('.menu_item').on('click', function (e){
          $.ajax({
            url: '/movie/101299-the-hunger-games-catching-fire/images?kind=backdrop&language=' + $(this).attr('alt') + '&translate=false',
            cache: false
          }).done(function(response) {
            $('#image_panel').html(response);
          });
        });

        $.ajax({
          url: '/movie/101299-the-hunger-games-catching-fire/images?kind=backdrop&language=&translate=false', //goal
          cache: false
        }).done(function(response) {
          $('#image_panel').html(response);
        });   
      });        
</script>

有一些方法可以获取第二个请求网址,并将其放入变量,我需要转到此网址。不幸的是我没有找到关于它的东西,也许phantomjs可以帮助我吗?

1 个答案:

答案 0 :(得分:1)

我认为您将手动解析脚本元素。您可以使用Nokogiri来获取脚本元素的文本。然后使用正则表达式查找最后一个网址:

假设脚本是页面上的第一个脚本,您可以执行以下操作:

url = doc.at_css('script').text.scan(/url: '(.*)'/).last.first

以下内容将脚本分解为每个步骤的说明:

# Get the text of the script element
# Note that this assumes it is the first script element (you may need to be more specific)
script = doc.at_css('script').text

# Find all urls in the script
urls = script.scan(/url: '(.*)'/)

# Of the urls found, take the last one
url = urls.last

# url is actually an array of length 1, since we used a matching group in the regex
# Take the first element of the array to get the url as a string
url = url.first
#=> "/movie/101299-the-hunger-games-catching-fire/images?kind=backdrop&language=&translate=false"