Question

我正在尝试使用简单的PHP脚本来抓取一个网站（相信它是在JavaScript中）。我是初学者，所以任何帮助将不胜感激。网页的网址是：

http://www.indiainfoline.com/Markets/Company/Fundamentals/Balance-Sheet/Yes-Bank-Ltd/532648

所以这里例如我想在get_file_contents中传递公司名称（Yes-Bank-Ltd）和代码（532648）。不确定怎么做，所以有人可以帮忙。

谢谢，尼迪

Answer 1

为什么不在网址中附加公司和代码的字符串。这里有一个想法，你填写一组公司和代码（需要相同的大小），然后循环它们来刮取你想要的数据。

for($i=0;$i<count($listOfCie);$i++)
{
    $cie = $listOfCie[$i];
    $code = $listOfCode[$i];
    $urlToScrape = "http://www.indiainfoline.com/Markets/Company/Fundamentals/Balance-Sheet/" . $cie . "/" . $code
    //... = get_file_contents($urlToScrape....
 }

Answer 2

在YQL中使用data.html表！ http://developer.yahoo.com/yql/console

Answer 3

在PHP中抓取网站的最简单方法是使用myData = [ { "name": "namehere" "path": "somepath", "const": "someconst", "method": "somemethod" "" }, { "name": "othernamehere" "path": "othersomepath", "const": "othersomeconst", "method": "othersomemethod" "" } ]; myData.map((module)=>{ import(module.path+'/'+module.name).then(module => { // Here you should use an array and assign each module to an array // which you can use later to use it's exported methods }); })（http://php.net/manual/en/book.curl.php）

有些例子请查看http://php.net/manual/en/curl.examples-basic.php或google :)

如果网站依赖于javascript，虽然它很难获得你想要的数据。您可能会看到一个无头浏览器＆＃34;比如http://phantomjs.org/

使用php抓一个网站（javascript网站）

3 个答案: