检查iframe HTML代码和src属性

时间:2012-12-05 19:06:39

标签: php regex iframe domdocument

我有一个元素,用户可以粘贴由“分享”创建的iframe - 来自 SoundCloud YouTube <的链接/ strong>或 Vimeo

仅保留iframe(因为 Vimeo 的嵌入代码包含更多内容)并验证它,我编写了以下脚本:

可能的有效字符串

$media = '<iframe width="100%" height="166" scrolling="no" frameborder="no" src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F673005?"></iframe>';
$media = '<iframe width="560" height="315" src="http://www.youtube.com/embed/msdFDCcdwaA" frameborder="0" allowfullscreen></iframe>';
$media = '<iframe src="http://player.vimeo.com/video/2285902?badge=0" width="500" height="409" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe> <p><a href="http://vimeo.com/2285902">Remind Me</a> from <a href="http://vimeo.com/royksopp">R&ouml;yksopp</a> on <a href="http://vimeo.com">Vimeo</a>.</p>';

检查iframe

try {
    // create DOMDocument and load content
    $dom = new \DOMDocument();
    $dom->loadHTML($media);

    // get all iframes
    $iframe = $dom->getElementsByTagName('iframe');

    // check whether at least one iframe exists, if so, choose it
    if (0 == $iframe->length) {
        throw new \Exception('Media Exception: Error 1');
    }
    $iframe = $iframe->item(0);

    // check whether the iframe has at least a height and a width (needed for re-calculating size based on inserted column)
    if ( !$iframe->hasAttribute('width') || !$iframe->hasAttribute('height') ) {
        throw new \Exception('Media Exception: Error 2');
    }

    // check the url - see edit
    // old version if ( !preg_match('/^(http|https):\/{2}(www.)?(youtube.com|vimeo.com|player.vimeo.com|soundcloud.com|w.soundcloud.com)/', $iframe->getAttribute('src'))) {
    if ( !preg_match('/^(http:|https:)?\/{2}(www.)?(youtube.com|vimeo.com|player.vimeo.com|soundcloud.com|w.soundcloud.com)/', $iframe->getAttribute('src'))) {
        throw new \Exception('Media Exception: Error 3');
    }

    // remove all disallowed attributes
    $allowedAttributes  = array('src', 'width', 'height', 'scrolling', 'frameborder', 'webkitAllowFullScreen', 'mozallowfullscreen', 'allowFullScreen');
    foreach ($iframe->attributes as $_item) {
    if (!in_array($_item->nodeName, $allowedAttributes)) {
        $iframe->removeAttribute($_item->nodeName);
    }

    // remove all content
    while ($iframe->hasChildNodes()) {
        $iframe->removeChild($iframe->firstChild);
    }

    $media = $dom->saveHTML($iframe);

} catch (\Exception $e) {
    // error handling
    $media  = null;
}

这似乎工作正常,但看起来像一个很长的检查,以防止一切不必要的。所以我的问题是:

  • 有没有更短的方法呢?
  • 我是否忽略了让任何“坏”通过的东西?

修改

如果有人发现此事并发现它有用,请注意:自从大约一个月后,YouTube更改了<src> - <iframe>来自http://www.youtube.com”的属性“// www.youtube.com”。我已经更新了这个问题的源代码来处理这个问题。

1 个答案:

答案 0 :(得分:0)

你也应该逃避点。我会创建以下表达式:

'/ ^ HTTPS:?/ {2}(WWW)(YouTube的| VIMEO | player.vimeo |的SoundCloud | w.soundcloud)的.com /'