Question

我尝试从Instagram URL中提取简码

这是我已经尝试过的内容，但是当它们是中间的用户名时，我不知道如何提取。非常感谢您的回答。

Instagram模式：/ p / shortcode /

https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/

expected : BxKRx5CHn5i

Answer 1

您可以添加一个可选的(?:\/\w+)?非捕获组。

请注意，\w也与_和\d匹配，因此捕获组可以更新为([\w-]+)，并且非捕获组中的正斜杠也可以写为只是/

^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com(?:\/\w+)?\/p\/)([\w-]+)(?:\/)?(\?.*)?$

Regex demo

如果使用与/不同的定界符，则不必转义反斜杠。您的模式可能如下：

^(?:https?://)?(?:www\.)?(?:instagram\.com(?:/\w+)?/p/)([\w-]+)/?(\?.*)?$

Answer 2

此表达式也可能起作用：

^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$

测试

$re = '/^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$/m';
$str = 'https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach ($matches as $match) {
    var_export($match[1]);
}

如果要浏览/简化/修改该表达式，请在this demo的右上角进行解释。

Answer 3

我接受了您的原始查询，并在nil之前添加了.*

这给了一个查询 \/p\/

如果用户名始终跟随^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com.*\/p\/)([\d\w\-_]+)(?:\/)?(\?.*)?$

，这会更简单

/p/

Answer 4

假设您不只是信任/p/作为子字符串之前的标记，则可以使用此模式，该模式将消耗所需子字符串之前的一个或多个目录。

请注意，\K重新开始了全字符串匹配，并有效地消除了使用捕获组的需要-这意味着输出数组更小且模式更短。

选择模式内没有出现的~之类的模式定界符可以减轻对正斜杠的需求。这又使您的模式更简短，更易于阅读。

如果您确实要依赖/p/子字符串，则只需在我的p/之前添加\K。

代码：（Demo）

$strings = [
    "https://www.instagram.com/p/BxKRx5CHn5i/",
    "https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
    "https://www.instagram.com/p/BxKRx5CHn5i/",
    "https://www.instagram.com/username/p/BxE5PpZhoa9/",
    "https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];

foreach ($strings as $string) {
    echo preg_match('~(?:https?://)?(?:www\.)?instagram\.com(?:/[^/]+)*/\K\w+~', $string , $m) ? $m[0] : '';
    echo " (from $string)\n";
}

输出：

BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BrODg5XHlE6 (from https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176)
BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere)

如果您隐式地信任/p/作为标记，并且您知道您正在处理instagram链接，则可以避免使用正则表达式，而只剪掉11个字符的子字符串，标记后的3个字符。

代码：（Demo）

$strings = [
    "https://www.instagram.com/p/BxKRx5CHn5i/",
    "https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
    "https://www.instagram.com/p/BxKRx5CHn5i/",
    "https://www.instagram.com/username/p/BxE5PpZhoa9/",
    "https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];

foreach ($strings as $string) {
    $pos = strpos($string, '/p/');
    if ($pos === false) {
        continue;
    }
    echo substr($string, $pos + 3, 11);
    echo " (from $string)\n";
}

（与以前的技术输出相同）

从Instagram URL提取简码

4 个答案:

测试