如何使用preg_replace在第3和第4段之间插入文本字符串?

时间:2015-04-05 06:40:39

标签: regex wordpress preg-replace

我正在试图弄清楚如何在Wordpress帖子中创建一个名为“pullquote”的常见报纸设备。 (但这不是一个严格的Wordpress问题;它更像是一个通用的正则表达式问题。)我有一个标签围绕帖子中的文本。我想在标签之间复制文本(我知道该怎么做),并将其插入帖子中p标签的第3和第4个实例之间。

下面的函数找到文本并删除标记,但只是将匹配的文本添加到开头。我需要帮助定位第3 /第4段

或......也许我在想这个错误。也许有一些方法可以像jQuery nth-child那样用元素来定位元素?

发表:

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<p>And here is a 4th paragraph.</p>

期望的结果

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of Tatort or Bukow & Konig.</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<blockquote class="pullquote">Tatort or Bukow & Konig</blockquote>
<p>And here is a 4th paragraph.</p>

到目前为止,这就是我对代码的要求:

function jchwebdev_pullquote( $content ) {
    $newcontent = $content;
    $replacement = '$1';
    $matches = array();
    $pattern = "~\[callout\](.*?)\[/callout\]~s";
    // strip out 'shortcode'
    $newcontent = preg_replace($pattern, $replacement, $content);
    if( preg_match($pattern, $content, $matches)) {
      // now have formatted pullquote 
      $pullquote = '<blockquote class="pullquote">' .$matches[1] . '</blockquote>';
      // now how do I target and insert $pullquote
      // between 3rd and 4th paragraph?
      preg_replace($3rd_4th_pattern, $3rd_4th_replacement,
      $newcontent);
      return $newcontent;
    }
    return $content;    
}
add_filter( 'the_content' , 'jchwebdev_pullquote');

编辑:我想修改我的问题,使Wordpress更具体。 Wordpress实际上将换行符转换为

字符。大多数Wordpress帖子甚至不使用明确的'p'标签,因为它们是不需要的。到目前为止,解决方案的问题在于它们似乎剥离了换行符,所以如果帖子(源文本)有换行符,那看起来很奇怪。

典型的真实世界Wordpress帖子:

If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].

If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.

And here is a 3rd paragraph.


And here is a 5th paragraph.

Wordpress呈现如下:

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<p></p>
<p>And here is a 5th paragraph.</p>

所以在一个完美的世界里,我想把'典型的真实世界帖子'和preg_replace渲染为:

If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of Tatort or Bukow & Konig.

If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.

And here is a 3rd paragraph.

<blockquote class="callout">Tatort or Bukow & Konig</blockquote>

And here is a 5th paragraph.

...然后Wordpress将呈现为:

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of Tatort or Bukow & Konig.</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<blockquote class="callout">Tatort or Bukow & Konig</blockquote>
<p>And here is a 5th paragraph.</p>

也许这已经离得太远了我应该在Wordpress论坛重新发帖,但我想 - 我需要的是一种改变preg_replace以使用换行符作为分隔符而不是

并弄清楚如何从返回的字符串中删除那些换行符。

感谢所有帮助!

3 个答案:

答案 0 :(得分:1)

您可以在一个preg_replace函数中执行此操作。

$re = "~^(?:(?!/p).)*<p>(?:(?!/p).)*\\[callout\\](.*?)\\[/callout\\].*?</p>(?:[^<>]*<p>.*?</p>){2}[^<]*\\K~s";
$str = "<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>\n<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>\n<p>And here is a 3rd paragraph.</p>\n<p>And here is a 4th paragraph.</p>";
$subst = "<blockquote class=\"pullquote\">$1</blockquote>\n";
$result = preg_replace($re, $subst, $str);
echo $result;

DEMO

Code in eval

答案 1 :(得分:1)

只需将(.*?</p>){3}\Ks修饰符一起使用即可实现您的目标:

preg_replace("@(.*?</p>){3}\K@s", $pullquote, $content);

我对您的功能进行了一些更改,以便正常工作:

function jchwebdev_pullquote( $content )
{
    $pattern = "~\[callout\](.*?)\[/callout\]~s";
    if(preg_match($pattern, $content, $matches))
    {
      $content = preg_replace($pattern, '$1', $content);
      $pullquote = '<blockquote class="pullquote">' .$matches[1] . '</blockquote>';
      $content = preg_replace("@(.*?</p>){3}\K@s", $pullquote, $content);
      return $content;
    }
    return $content;    
}

<强> Regex live demo

<强> PHP live demo

更新#1

优化:使用单个preg_replace来避免应用多个模式:

function jchwebdev_pullquote( $content )
{
    $pattern = "\[callout\](.*?)\[/callout\]";
    if(preg_match("@(?s)$pattern@", $content, $matches))
    {
      $content = preg_replace("@(?s)($pattern)((.*?</p>){3})@", '\2\3<blockquote class="pullquote">\2</blockquote>', $content);
      return $content;
    }
    return $content;
}

<强> PHP live demo

答案 2 :(得分:1)

如果您想使用PHP HTML / XML解析,请参阅How do you parse and process HTML/XML in PHP?

对于正则表达式解决方案,这是一个正则表达式解决方案:

发现: (?s)((?:<p>.*?<\/p>\s*){3})

此正则表达式将捕获前3个<p>标记,然后在它们之后添加一个节点。

REPLACE: $1<blockquote class="pullquote">Tatort or Bukow & Konig</blockquote>\n

代码:

$re = "/(?s)((?:<p>.*?<\\/p>\\s*){3})/"; 
$str = "<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>\n<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>\n<p>And here is a 3rd paragraph.</p>\n<p>And here is a 4th paragraph.</p>"; 
$subst = "$1<blockquote class=\"pullquote\">Tatort or Bukow & Konig</blockquote>\n"; 
$result = preg_replace($re, $subst, $str, 1);

Demo is here