从CLI或Web调用时挂起PhantomJS

时间:2013-05-20 20:24:44

标签: javascript web-scraping phantomjs

我正在尝试使用phantomJS来捕获URL的屏幕截图,但是当我调用phantomJS(来自命令行或Web应用程序)时,它会挂起并且看不到执行“exit()”调用。我似乎无法找到任何错误消息,它一直运行,直到我杀了它。这是传递给phantomjs命令的JS文件:

var page = require('webpage').create();
var system = require('system');
var script_address = '';
var page_to_load = '';
var members_id = '';
var activities_id = '';
var folder_path = '';

if (system.args.length < 5) 
{
    console.log('Usage: phantom_activity_fax.js script_address page_to_load members_id activities_id folder_path');
    console.log('#Args: '+system.args.length);
    phantom.exit();
}//END IF SYSTEM.ARGS.LENGTH === 1

//ASSIGN OUR ARGUMENTS RECIEVED
script_address = system.args[0];
page_to_load = system.args[1];
members_id = system.args[2];
activities_id = system.args[3];
folder_path = system.args[4];

console.log(system.args[0]);
console.log(system.args[1]);
console.log(system.args[2]);
console.log(system.args[3]);
console.log(system.args[4]);

//OPEN OUR PAGE WITH THE VALUES PROVIDED
page.open(page_to_load, function () {
    console.log("Entering Anonymous Function, Beginning RENDER:\n");
    page.render(folder_path+members_id+'_'+activities_id+'.png');
    phantom.exit();
});

我看到值被推送到控制台,但之后它只是挂起:(我已经尝试了网络检查器,但无法理解在哪里执行__run()调用,并且当我没有看到任何变化在调用中添加了debugger-autorun = yes:(。

这是我挂起时从命令行获得的输出(以root用户身份):

[root@wv-wellvibe2 faxes]# phantomjs /var/www/wv-wellvibe2-test/javascripts/phantom_activity_fax.js https://wv-wellvibe2-test/manual_scripts/phantom_js_test_page.php 397 0 /var/www/wv-wellvibe2-test/uploads/images/faxes/
/var/www/wv-wellvibe2-test/javascripts/phantom_activity_fax.js
https://wv-wellvibe2-test/manual_scripts/phantom_js_test_page.php
397
0
/var/www/wv-wellvibe2-test/uploads/images/faxes/

这是我作为自己的用户运行时获得的输出,但我没有在指定文件夹中看到图像文件(传真):

[user@wv-wellvibe2 ~]$ phantomjs /var/www/wv-wellvibe2-test/javascripts/phantom_activity_fax.js https://wv-wellvibe2-test/manual_scripts/phantom_js_test_page.php 397 0 /var/www/wv-wellvibe2-test/uploads/images/faxes/
/var/www/wv-wellvibe2-test/javascripts/phantom_activity_fax.js
https://wv-wellvibe2-test/manual_scripts/phantom_js_test_page.php
397
0
/var/www/wv-wellvibe2-test/uploads/images/faxes/
Entering Anonymous Function, Beginning RENDER:
[user@wv-wellvibe2 ~]$ 

不幸的是,正如我所说,命令已完成但未在传真文件夹中保存.png。以下是该文件夹的权限:

[root@wv-wellvibe2 faxes]# ls -la
total 12
drwxr-xr-x 3 root   apache 4096 May 16 15:31 .
drwxr-xr-x 5 apache apache 4096 May 16 14:14 ..
drwxr-xr-x 6 apache apache 4096 May 20 15:05 .svn

如果我还能提供其他任何信息,请告诉我! 谢谢!

(这里要求的是调用Phantom JS进程的PHP脚本)

header("Date: " . date('Y-m-d H:i:s'));
//GET THE SMARTY CONFIG
include_once $_SERVER['DOCUMENT_ROOT'] . "/smarty/configs/config.php";

//VARS USED LATER
$process_script = $_SERVER['DOCUMENT_ROOT'] . '/javascripts/phantom_activity_fax.js';
$page_to_load = 'https://' . $_SERVER['HTTP_HOST'] . '/manual_scripts/phantom_js_test_page.php';
$members_id = $_SESSION['members_id'];
$activities_id = 0;
$folder_path = $_SERVER['DOCUMENT_ROOT'] . 'uploads/images/faxes/';
$system_response = '';


$call = "phantomjs --remote-debugger-port=65534 --remote-debugger-autorun=yes " .  $process_script . " " . $page_to_load . " " . $members_id . " " . $activities_id . " " . $folder_path;

echo 'CallingSystemWith: ' . $call . '<br />';

try 
{
    $system_response = system($call);

    echo '<br />SystemResponse: ' . $system_response . '<hr />';
} catch (Exception $exc) {
    echo $exc->getTraceAsString();
}//END TRY / CATCH

(它告诉PhantomJS“scrape”的页面是一个简单的PHP脚本,它输出$ _SESSION和$ _REQUEST的print_r())

2 个答案:

答案 0 :(得分:19)

如果您的脚本出现问题(例如在page.render中),则永远不会调用phantom.exit()。这就是为什么phantomJs似乎要挂起来了。

也许page.render存在问题,但我不这么认为。挂起的最常见原因是未处理的异常。

我会建议你解决这个问题:

  • phantom.onError和/或page.onError
  • 添加处理程序
  • 将您的代码封装在try / catch块中(例如page.render
  • 加载页面后,回调状态没有测试。最好检查状态
  • 调用page.render时,
  • 似乎冻结了。你在当前目录中尝试过一个更简单的文件名吗?也许冻结是因为安全性或文件名无效(无效字符?)

希望这会对你有所帮助

答案 1 :(得分:11)

使用:

$phantomjs --debug=true rasterize.js http://... test.pdf

在rasterize.js中为ressource添加超时,这是我的问题:

page.settings.resourceTimeout = 10000; // Avoid freeze!!!