preg_match查找并替换字符串模式

时间:2013-08-13 12:33:06

标签: php html-parsing preg-replace preg-match

我有一个wordpress数据库,它有一些嵌入来自声音云的iframe。我希望用某种短代码替换iframe。我甚至创建了一个短代码,它的效果非常好。

问题是我有一个旧的数据库,大约有2000个帖子,已经嵌入了代码。我想要做的是编写一个代码,以便用短代码替换iframe。

以下是我用于从内容中查找网址的代码,但它始终返回空白。

$string = 'Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson\'s retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad\'s record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you\'ve ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>\'s Jeffree\'s sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Fplaylists%2F8087281&amp;color=000000&amp;auto_play=false&amp;show_artwork=true" frameborder="no" scrolling="no" width="100%" height="350"></iframe>';

preg_match("/url=(.*?)/", $string, $matches);

print_r($matches);

上面的代码不起作用,我对正则表达式不太熟悉,所以如果有人能够弄清楚这里有什么问题那么它会很棒。而且,如果有人能指导我做正确的过程,那就太棒了。

5 个答案:

答案 0 :(得分:4)

由于您在这里使用HTML,我建议使用DOM函数:

$doc = new DOMDocument;
$doc->loadHTML($string);

foreach ($doc->getElementsByTagName('iframe') as $iframe) {
    $url = $iframe->getAttribute('src');
    // parse the query string
    parse_str(parse_url($url, PHP_URL_QUERY), $args);
    // save the modified attribute
    $iframe->setAttribute('src', $args['url']);
}

echo $doc->saveHTML();

这会输出完整的文档,因此您需要将其修剪下来:

$body = $doc->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $node) {
    echo $doc->saveHTML($node);
}

输出:

<p>Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson's retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad's record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you've ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>'s Jeffree's sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="http://api.soundcloud.com/playlists/8087281" frameborder="no" scrolling="no" width="100%" height="350"></iframe></p>

答案 1 :(得分:2)

这应该适用于您指定的内容

$new_string = preg_replace('/(?:<iframe[^\>]+src="[^\"]*url=([^\"]*soundcloud\.com[^\"]*))"[^\/]*\/[^\>]*>/i', '[soundcloud url="$1"]', $string);

仅限于ifl使用url = ... soundcloud ...部分在src属性中并用[soundcloud url =“{part after url =}”替换整个iframe代码]

答案 2 :(得分:2)

对于一次性修复,您可以考虑使用SQL解决方案。使用以下SQL的一些假设:

  • 每个帖子只有一个iframe需要替换(如果有多个iframe的帖子,则可以多次运行SQL。)
  • 要替换的所有iframe的格式为:

<iframe src="https://w.soundcloud.com/player/?url="..." other-stuff</iframe>

  • 您关心的是url参数
  • 的引号之间的内容
  • 最终结果是[soundcloud url =“...”]

如果所有这些都是真的,那么下面的SQL应该可以解决问题。如果你想要一个不同的短代码等,可以调整它。

在执行任何批量更新之前,请务必备份您的wp_posts表。

CREATE TABLE wp_posts_backup SELECT * FROM wp_posts
;

备份完成后,以下SQL应该一次修复所有帖子:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

我建议你在针对所有帖子运行之前测试一些帖子。一种简单的测试方法是将以下内容添加到上面的WHERE子句中(紧接在';'之前),更改'?'到要测试的帖子ID。

AND p.ID IN (?,?,?)

如果您因任何原因需要恢复帖子,可以执行以下操作:

UPDATE wp_posts p
  JOIN wp_posts_backup b
    ON b.ID = p.ID
   SET p.post_content = b.post_content
;

另外要考虑的事情。我不确定你是否想要传递当前属于url的参数,所以我把它们包括在内。您可以通过更改以下内容轻松删除它们:

                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'

为:

                           ,'"]'

导致:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

已更新以允许网址中没有参数

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX(
                                     SUBSTRING_INDEX( SUBSTR( p.post_content
                                                            , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                            )
                                                    , '&amp;', 1
                                                    )
                                                , '"', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;
祝你好运。

答案 3 :(得分:1)

<?php
    preg_match("/url\=([^\"]+)/i", $string, $matches);

所以基本上你想在url =之后匹配任何字符(1+),但不是在“

之后

答案 4 :(得分:1)

我建议调查simplehtmldom。它是一个DOM解析器,它使用类似于jQuery和CSS的选择器。

http://simplehtmldom.sourceforge.net/

$html = load($html_from_database);
// Find all frames
foreach($html->find('frame') as $element){
   $source = $element->src; // extract the source from the frame.
   // This is where you do your magic like changing links. 
   $element->href = $source ; // This is where you replace the old source
}


// UPDATE $html back into the table.

确保在解析后更新任何表之前对所有表进行完整备份:)

http://simplehtmldom.sourceforge.net/manual.htm