Question

我有一个字符串，例如：

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';

我想在字符串中搜索以youtube.com或youtu.be开头的第一个网址，并将其存储在变量$first_found_youtube_url中。

我该如何有效地做到这一点？

我可以查找网址preg_match或strpos，但不确定哪种方法更合适。

Answer 1

我曾经写过这个函数，它使用正则表达式并返回一个唯一网址数组。由于你想要第一个，你可以使用数组中的第一个项目。

function getUrlsFromString($string) {
    $regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#i';
    preg_match_all($regex, $string, $matches);
    $matches = array_unique($matches[0]);           
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

示例：

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getUrlsFromString($html);
$first_found_youtube = $urls[0];

使用YouTube特定的正则表达式：

function getYoutubeUrlsFromString($string) {
    $regex = '#(https?:\/\/(?:www\.)?(?:youtube.com\/watch\?v=|youtu.be\/)([a-zA-Z0-9]*))#i';
    preg_match_all($regex, $string, $matches);
    $matches = array_unique($matches[0]);           
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

示例：

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getYoutubeUrlsFromString($html);
$first_found_youtube = $urls[0];

Answer 2

你可以使用DOMDocument解析html并查找带有stripos的youtube url，类似这样的

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$DOMD = @DOMDocument::loadHTML($html);

foreach($DOMD->getElementsByTagName("a") as $url)
{
    if (0 === stripos($url->getAttribute("href") , "https://www.youtube.com/") || 0 === stripos($url->getAttribute("href") , "https://www.youtu.be"))
    {
        $first_found_youtube_url = $url->getAttribute("href");
        break;
    }
}

个人而言，我可能会使用

"youtube.com"===parse_url($url->getAttribute("href"),PHP_URL_HOST)

虽然，它会得到http和https链接..这可能是你想要的，虽然严格来说，不是你现在要求的顶级帖子..

Answer 3

我认为这会做你想要的，我使用preg_match_all只是因为我发现调试正则表达式更容易。

<?php

$html = '<p>hello<a href="https://www.youtu.be/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';

$pattern = '/https?:\/\/(www\.)?youtu(\.be|\com)\/[a-zA-Z0-9\?=]*/i';
preg_match_all($pattern, $html, $matches);

// print_r($matches);
$first_found_youtube = $matches[0][0];
echo $first_found_youtube;

演示 - https://3v4l.org/lFjmK

在包含HTML代码的字符串中抓取URL

3 个答案: