从网站解析电话号码

时间:2012-04-24 21:04:57

标签: php html parsing curl

尝试解析一些电话号码from a website

当我通过cURL获取源代码时,我只返回了一半的代码,但缺少的部分正是我需要的。这件事让我烦恼。

到目前为止我的代码:

$ch = curl_init("http://www.baroul-bucuresti.ro/index.php?w=definitivi&l=C&p=2");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$content = curl_exec($ch);
curl_close($ch);
print_r ($content);

1 个答案:

答案 0 :(得分:2)

我认为问题是有问题的网址中有一个302,将其重定向到另一个位置:

$ telnet www.baroul-bucuresti.ro 80
Trying 91.208.179.20...
Connected to www.baroul-bucuresti.ro.
Escape character is '^]'.
GET /index.php?w=definitivi&l=C&p=2 HTTP/1.1
host: www.baroul-bucuresti.ro

HTTP/1.1 302 Found
Date: Fri, 27 Apr 2012 20:24:54 GMT
Server: Apache/2.2.15 (CentOS)
X-Powered-By: PHP/5.3.3
Set-Cookie: PHPSESSID=qjbqvveqtmarv7o0f820bbeq71; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: for_tablou=1
Set-Cookie: bvbsessionhash=b9c609e162dab90fc86c1fdb52e07fdd; expires=Sun, 27-May-2012 20:24:57 GMT; path=/
Set-Cookie: bvblastvisit=1335558297; expires=Sun, 27-May-2012 20:24:57 GMT; path=/
Set-Cookie: bvblastactivity=1335558297; expires=Sun, 27-May-2012 20:24:57 GMT; path=/
Set-Cookie: bvbuserid=deleted; expires=Thu, 28-Apr-2011 20:24:56 GMT; path=/
Set-Cookie: for_tablou=1
Location: /tablou

我已经通过将此选项添加到curl来更改了代码:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

它似乎现在得到了全部内容..不知道它是否是你想要的内容,但它获得了真实位置的全部内容,你能试一试吗?