我想从网址中提取html标签,但有些网站会返回未知字符... 我认为带有ajax编程的url有这个问题...... 这是我的代码
SELECT pt.EmployeeID , pt.FirstName, pt.LastName, pt.[Address]
FROM
(
SELECT *
FROM FlatFile.UserList
) AS temp
PIVOT
(
MAX([COLUMN2])
FOR [COLUMN1] IN ([EmployeeID],[FirstName],[LastName],[Address])
) as pt
它给我一个像这样的字符串
$url='http://www.varzesh3.com';
$ch=curl_init();
$timeout=5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
// Get URL content
$lines_string=curl_exec($ch);
// close handle to release resources
curl_close($ch);
//output, you can also save it locally on the server
echo $lines_string;
我该如何解决?... 我想要HTML标签内的内容
答案 0 :(得分:3)
您在示例中提供的网站链接包含需要utf-8的阿拉伯语单词。它还返回gzip压缩数据。
您可以使用标题来支持页面上的utf-8
header('Content-type: text/html; charset=UTF-8');
并在您的卷曲请求中将CURLOPT_ENCODING
设置为gzip
最终守则应
<?php
header('Content-type: text/html; charset=utf-8');
$url='http://www.varzesh3.com';
$ch=curl_init();
$timeout=5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
// We will set curl header to support utf-8 charset
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: text/html; charset=UTF-8"));
// We will support gzip encoded data
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
// Get URL content
$lines_string=curl_exec($ch);
// close handle to release resources
curl_close($ch);
//output, you can also save it locally on the server
echo $lines_string;
答案 1 :(得分:-1)
如果是像json和xml这样的指定格式,Curl将获取数据。 但你只是写url而不是特定的url,它以json或array或xml等方式发送数据。
如果您想下载整个网站,请使用命令行尝试CURL。