使用php curl代理浏览网站

时间:2012-11-23 19:07:16

标签: php curl proxy

我想设置一个自动使用代理的网页。这是我的剧本:

<?

$url = 'http://www.sciencedirect.com/science/jrnlallbooks/a/fulltext';
$proxy = '200.93.148.72:3128';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
$curl_scraped_page = curl_exec($ch);


echo $curl_scraped_page;

?>

我可以打开网页,标题显示:

HTTP / 1.0 200 OK日期:星期五,2012年11月23日18:46:41 GMT Last-Modified:星期五,2012年11月23日18:46:41 GMT Set-Cookie:MIAMISESSION = 86e5ecb0-359b-11e2-b116- 00000aab0f6c:3531149201;路径= /;域= .sciencedirect.com; Set-Cookie:USER_STATE_COOKIE =; expires = Thu,1970年1月1日23:59:59 GMT;路径= /;域= .sciencedirect.com; Set-Cookie:SD_REMOTEACCESS =; expires = Thu,1970年1月1日23:59:59 GMT;路径= /;域= .sciencedirect.com;设置Cookie:MIAMIAUTH = 6c1869e3dd5d3ca5644fdd359099d551fee57ff4f19a0d9e30c92bcc3f4cdcb755a638c57790c6f118d4a8601a914733e770454895a95214fd2a92b748418c15aabbe7e39dbfe22d18a337761caf3eebb621aa3f17803d29fa1a241d10f4aad71e83423e9562a1ec67194a18c7a016cd36828cdb6ccdaef46d038a2ee15429cd0ee88a636ec51602cee8d34e3397f0c720230f6ab68fbc74c285431372f89886ba1bbbb03f6873e2804f1577f52679f16123dd0c07d70ab0b92145c1c383e4155512e57b8da9452ad570394af0c66b0859739b1e77c2d98372d5a1b978828531f3a042a816bf4a9edbe45d4f9197a685aa1506ae57ec1593efd428842244a96f9d2033b43ccf50a14843907943eb57b7c9dd1bef11603f9e686aad6847870ac6fec520209a31df9efb3d0ee4e24341c4c5dd6c12060a6a624c3ff60ec16286f7cb6c3839f8f375c00c836958eada8d4900baa294fa3645c02f1b3ac78c7bc78bc2d79f5f4e038b6ae465d63f0100a53731ec826eba3c6f8f648bf03d6ac7d450788f0362055ca413073d9333348cacdc6e4d6222a420a78620a968b185954fcc76b3a9a63f2e62f9;路径= /;域= .sciencedirect.com; Set-Cookie:TARGET_URL = fcf74dd786744d87fbaaaf8652a764ab4a79b0d3ed681139e9106923760631052596d348948479933da48b3723069bbf09065290c950dc02c1f0d1436659ad5a;路径= /;域= .sciencedirect.com;设置Cookie:MIAMIAUTH = 6c1869e3dd5d3ca5644fdd359099d551fee57ff4f19a0d9e30c92bcc3f4cdcb755a638c57790c6f118d4a8601a914733e770454895a95214fd2a92b748418c15aabbe7e39dbfe22d18a337761caf3eebb621aa3f17803d29fa1a241d10f4aad71e83423e9562a1ec67194a18c7a016cd36828cdb6ccdaef46d038a2ee15429cd0ee88a636ec51602cee8d34e3397f0c720230f6ab68fbc74c285431372f89886ba1bbbb03f6873e2804f1577f52679f16123dd0c07d70ab0b92145c1c383e4155512e57b8da9452ad570394af0c66b0859739b1e77c2d98372d5a1b978828531f3a042a816bf4a9edbe45d4f9197a685aa1506ae57ec1593efd428842244a96f9d2033b43ccf50a14843907943eb57b7c9dd1bef11603f9e686aad6847870ac6fec520209a31df9efb3d0ee4e24341c4c5dd6c12060a6a624c3ff60ec16286f7cb6c3839f8f375c00c836958eada8d4900baa294fa3645c02f1b3ac78c7bc78bc2d79f5f4e038b6ae465d63f0100a53731ec826eba3c6f8fe5378db869312c80a0addc0d8946f7f6552daa333f2e38da51e23c1d44ae41176c1b2e70f7b144f63a44c25741cd0126;路径= /;域= .sciencedirect.com; Content-Type:text / html Expires:Tue,01 Jan 1980 05:00:00 GMT X-RE-Ref:0 19194695服务器:www.sciencedirect.com P3P:CP =“IDC DSP LAW ADM DEV TAI PSA PSD IVA IVD CON HIS TEL我们的DEL SAM OTR IND场外交易“变化:接受编码,用户代理X-Cache:MISS来自alejandria.ufps.edu.co X-Cache-Lookup:HIT from alejandria.ufps.edu.co:3128 Via :1.0 alejandria.ufps.edu.co(squid / 3.0.STABLE15)代理连接:关闭

但是,代理仅适用于此页面。当我单击此页面上的其他链接时,不会加载任何代理。请帮我解决这个问题。如何改进我的脚本?我想要一个完整的网站(所有链接)来使用代理。如何设置?

1 个答案:

答案 0 :(得分:1)

您似乎需要使用正则表达式修改HTML代码中的任何链接(例如<a href="...">)以指向您的脚本。然后你必须为cURL设置一个参数,所以你会得到正确的页面,所以它看起来像http://YourSite.com/proxy.php?site=http://example.com/smth/foo.php