我想获取网站的所有html内容,并为所有网址添加自定义GET
参数。
我的代码:
function getUrls($string) {
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $string, $matches);
//return (array_reverse($matches[0]));
return ($matches[0]);
}
function addToUrl($url, $key, $value = null) {
$query = parse_url($url, PHP_URL_QUERY);
if ($query) {
parse_str($query, $queryParams);
$queryParams[$key] = $value;
$url = str_replace("?$query", '?' . http_build_query($queryParams), $url);
} else {
$url .= '?' . urlencode($key) . '=' . urlencode($value);
}
return $url;
}
$s = file_get_contents('http://youtube.com');
$urls = getUrls($s);
foreach($urls as $url)
{
$withParam = addToUrl($url, 'wid', '${wid}');
$s = str_replace($url, $withParam, $s);
}
echo $s;
例如:
之前:
$content = '... <a href="http://foo.bar/register.php">register </a> ... <a href="http://foo.bar/login.php?t=1&">login</a> ...';
之后:
$content = '... <a href="http://foo.bar/register.php?wid=${wid}">register </a> ... <a href="http://foo.bar/login.php?t=1&wid=${wid}">login</a> ...';
你能帮助我吗?