删除网址尾部,仅保留域

时间:2019-11-02 09:16:44

标签: php arrays regex

我有一个脚本,向其发送链接列表。如何删除网址尾部并仅保留域名?例如:仅保留google.com而不是google.com/adwords。

<form method='post'>
<textarea name="url1" cols="40" rows="5"></textarea><br>
<input name="Submit" type='submit' value='Send'>
</form>

<?php
$array = explode("\r\n", $_POST['url1']);
$word_count = (array_count_values($array));
arsort($word_count);
foreach ($word_count as $key=>$val) {
echo '<a href="' . $key . '">' . $key . '</a> - ' . $val . '<br/>'; 
}
?>

我尝试过类似的事情:

$string = array('https://google.com/ytrewq', 'https://google.com/qwerty'); 
$pattern = '/[^/]+$/';
$replacement = "replacement";
print_r (preg_replace($pattern, $replacement, $string));
print_r (preg_grep($pattern, $string));
print_r (preg_filter($pattern, $replacement, $string)); 
print_r (preg_match($pattern,$string,$found));

但是它不起作用。

1 个答案:

答案 0 :(得分:0)

// for a single URL:
function getBaseUrl( $url, $includeScheme = false )
{
    $host = parse_url( $url, PHP_URL_HOST );
    if ( !$includeScheme ) {
        return $host;
    }

    $scheme = parse_url( $url, PHP_URL_SCHEME );
    return sprintf( '%s://%s', $scheme, $host );
}

$url = 'https://google.com/adwords';
echo getBaseUrl( $url ); // prints 'google.com'
echo getBaseUrl( $url, true ); // prints 'https://google.com'

// for an array of URLs:
function getBaseUrls( $urls, $includeScheme = false )
{
    $baseUrls = [];
    foreach ( $urls as $url ) {
        $baseUrls[] = getBaseUrl( $url, $includeScheme );
    }

    return $baseUrls;
}

$urls = [
    'https://google.com/ytrewq', 
    'https://google.com/qwerty'
];
print_r( getBaseUrls( $urls, true ) );

有关更多信息,请参见https://www.php.net/manual/en/function.parse-url.php