Question

我正在尝试使用正则表达式为页面的内容的第一个单词添加跨度，但是内容包含HTML，所以我试图确保只选择一个单词。每个页面的内容都会发生变化。

当前脚本是：

preg_match('/(<(.*?)>)*/i',$page_content,$matches);
$stripped = substr($page_content,strlen($matches[0]));
preg_match('/\b[a-z]* \b/i',$stripped,$strippedmatch);
echo substr($page_content, 0, strlen($matches[0])).'<span class="h1">'.$strippedmatch[0].'</span>'.substr($stripped, strlen($strippedmatch[0]));

但是如果$ page_content是 <p><span class="title">This is </span> my title!</p> 然后我的正则表达式认为第一个单词是“span”并在其周围添加标签。

有什么方法可以解决这个问题吗？（或者更好的方法）。

Answer 1

这似乎有用......

(?<=\>)\b\w*\b|^\w*\b

如果你想在前面留出空格（记得修剪得到的字符串）：

(?<=>)\s*\b\w*\b|^\s*\w*\b

Answer 2

你不应该使用正则表达式，但如果你坚持，你可以尝试这样的事情：

<?php

$texts = array(
  '<p><span class="title">This is </span> my title!</p>',
  '<1>   <2>   <3>   blah   blah   <4> <5> blah',
  'garbage <1> <2> real stuff begins <3> <4>',
);

foreach ($texts as $text) {
  print preg_replace('/(>\s*)(\w+)/', '\1{{\2}}', $text, 1)."\n";
}

?>

打印：

<p><span class="title">{{This}} is </span> my title!</p>
<1>   <2>   <3>   {{blah}}   blah   <4> <5> blah
garbage <1> <2> {{real}} stuff begins <3> <4>

Answer 3

如果我理解你是正确的，你需要一个标签围绕第一个单词（无标签）使用正则表达式，你可以使用这个正则表达式

$code = preg_replace('/^(<.+?>\s*)+?(\w+)/i', '\1<span class="h1">\2</span>', $code);

这个只是循环遍历标签并等待，直到它找到标签之外的文本

正则表达式找到第一个字

3 个答案: