Question

我有一些带有<img>标签的文字，我需要分组。它的格式为

<img.../> Text text text <img.../>text text text<img.../> text text text

我在preg_match_all进行了正则表达式qworking，以便我得到

Array
(
    [0] => Array
        (
            [0] => <img ... />
            [1] => <img ... />
            [2] => <img ... />
            [3] => <img ... />
        )

但如果我能得到

那真的很好

Array
(
    [0] => Array
        (
            [0] => <img ... />
            [1] => text text text 
            [2] => <img ... />
            [3] => text text text 
            [4] => <img ... />
            [5] => text text text 
        )

我尝试了一些事情，但我对PCRE没有很好的理解。我不想使用preg_split如果可以避免它，因为每个图像标签都不同。

（我知道一般的HTML解析器不能用正则表达式编写，但在这种情况下，我认为这样可行，因为我正在使用的输入数据是我描述的形式。没有是我需要担心的任何嵌套图像标签。）

PS 我已尝试/!<img.+>/，/!(<img.+>)/和/(!(<img.+>))/来获取不匹配项，但它返回一个空数组。我不知道调试正则表达式的好方法，知道我做错了什么。

Answer 1

我不知道您的问题（或实际代码）是什么，但是：

$r = preg_split('#<img[^>]+>#', $source, 0, PREG_SPLIT_DELIM_CAPTURE);

结果：

Array
(
    [0] => <img.../>
    [1] =>  Text text text 
    [2] => <img.../>
    [3] => text text text
    [4] => <img.../>
    [5] =>  text text text
)

代替正确的正则表达式，您可以继续使用#<img1>|<img2>|<img3>#固定字符串（我推测）。

Answer 2

通过执行以下操作，您可以获得所需的信息：

preg_match_all('~(<img[^>]*>)([^<]+)~', $str, $matches);

//if inside your "text text text" areas you have other html tags, use this:
preg_match_all('~(<img[^>]*>)(.+?)(?=<img|$)~', $str, $matches);

此时，$matches[0]包含整个匹配的字符串。 $matches[1]包含第一组括号中的所有匹配项，$matches[2]包含第二组括号中的所有匹配项。

Array ( 
  [0] => Array ( 
    [0] => <img.../> Text text text 
    [1] => <img.../>text text text 
    [2] => <img.../> text text text 
  )
  [1] => Array ( 
    [0] => <img.../> 
    [1] => <img.../> 
    [2] => <img.../> 
  ) 
  [2] => Array ( 
    [0] =>  Text text text 
    [1] => text text text 
    [2] =>  text text text 
  ) 
)

现在，如果您真的需要按照您希望的方式进行格式化，只需添加以下代码行：

$answer = array();
foreach($matches[0] as $i=>$match){
  $answer[] = $matches[1][$i];
  $answer[] = $matches[2][$i];
};

$answer现在看起来像这样：

Array ( 
  [0] => <img ... />
  [1] =>  Text text text 
  [2] => <img ... />
  [3] => text text text 
  [4] => <img ... />
  [5] =>  text text text 
)

从preg_match_all获取不匹配的字符串

2 个答案: