Question

我正在尝试从http://www.futbol24.com/Live/?__igp=1&LiveDate=20141104抓取数据，并为该网页上的每场比赛获得时间，主队和客队。

我尝试过使用jSoup - 但现在意识到该页面似乎在页面加载后加载了javascript ...有什么办法我还能获得这些数据吗？

干杯罗布

Answer 1

你不能用Jsoup。

您可以尝试使用Selenium和/或：

PhantomJS：

http://phantomjs.org /

和Pjscrape：

http://nrabinowitz.github.io/pjscrape/

例如，对于Phantomjs，您可以使用：

var page = require('webpage').create();
var fs = require('fs');// File System Module
var args = system.args;
var output = './temp_htmls/test1.html'; // path for saving the local file 
page.open('http://www.futbol24.com/Live/?__igp=1&LiveDate=20141104;rpp=50;po=0;dct=PS;D=OSHA-2013-0020', function() { // open the file 
  fs.write(output,page.content,'w'); // Write the page to the local file using page.content
  phantom.exit(); // exit PhantomJs
});

这里我们使用PhantomJs打开页面，然后在本地保存。之后你可以用Jsoup或Beautiful Soup来刮它。

祝你好运！

从看起来像Javascript的网页获取数据到Java？

1 个答案: