Question

我想知道如何解析使用angularjs作为其前端框架的网站。

以下代码解析http://www.pluralsight.com/courses/using-stackoverflow-stackexchange-sites以获取课程标题。

我得到的是{{course.title}}而不是实际的课程标题。谁能给我一些建议？

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("http://www.pluralsight.com/courses/using-stackoverflow-stackexchange-sites"))
title = doc.css("h1").first.text
puts title       # => {{course.title}}

Answer 1

Google有关于如何为ajax驱动的网站设置SEO的好文档。该网站遵循了这些准则。

使用该页面的<base>标记作为路径引用，您可以使用以下路径访问呈现的html：

http://www.pluralsight.com/courses?_escaped_fragment=/using-stackoverflow-stackexchange-sites

参考：Google Ajax Crawling Spec

作为替代方案，您可以使用无头浏览器呈现页面并将其用作源

Answer 2

您可以使用：

require 'phantomjs'
require 'watir'

b = Watir::Browser.new(:phantomjs)
b.goto URL

doc = Nokogiri::HTML(b.html)

@title = doc.css('h1').first.text

在http://phantomjs.org/download.html下载phantomjs并移动/ usr / bin

的二进制文件

如何解析使用angularjs的网站？

2 个答案: