Question

可能重复：
Robust, Mature HTML Parser for PHP

我正在尝试抓取字符串的第一个句子和第一个图像html实例。

$description = preg_split('/<img/', $item->description,null,PREG_SPLIT_DELIM_CAPTURE);

我能够返回一个数组，但是它正在从它需要的值中删除<img。我已经尝试过使用标志但无法获得返回我正在寻找哪些需要包含分隔符本身。我知道要抓住第一句话，我应该按句号或 

分开

字符串：

<p>First sentence here comes.&nbsp; Second sentence here it is.&nbsp; One more sentence.&nbsp;&nbsp;</p> <img alt="amj" src="https://domain.com/images7.jpg" /> <img alt="Ea" src="http://domain.com/images3.jpg" /> <img alt="amj" src="https://domain.com/images7.jpg" /> <img alt="amj" src="https://domain.com/images7.jpg" />

Answer 1

获得第一句话非常简单。您只需使用strpos和substr的混合，如下所示。至于获取第一个图像标记，您可以使用preg_match表达式执行此操作。

$first_sentence = substr($item->description, 0, strpos($item->description, ))

Answer 2

1）第一句话

echo substr($item->description, 0, strpos('.', $item->description));

2）img

preg_match('#<img[^>]*>#',$item->description , $img);
echo $img[0];

Answer 3

如果您使用PREG_SPLIT_DELIM_CAPTURE，则需要在与preg_split一起使用的正则表达式模式中提供捕获。

以您当前的模式：

/<img/

无法捕捉，这就是为什么你看到它被移除（Demo）：

Array
(
    [0] => <p>First sentence here comes.&nbsp; Second sentence here it is.&nbsp; One more sentence.&nbsp;&nbsp;</p> 
    [1] =>  alt="amj" src="https://domain.com/images7.jpg" /> 
    [2] =>  alt="Ea" src="http://domain.com/images3.jpg" /> 
    [3] =>  alt="amj" src="https://domain.com/images7.jpg" /> 
    [4] =>  alt="amj" src="https://domain.com/images7.jpg" />
)

但是，如果你创建一个捕获，它将被捕获：

/(<img)/

结果（Demo）：

Array
(
    [0] => <p>First sentence here comes.&nbsp; Second sentence here it is.&nbsp; One more sentence.&nbsp;&nbsp;</p> 
    [1] => <img
    [2] =>  alt="amj" src="https://domain.com/images7.jpg" /> 
    [3] => <img
    [4] =>  alt="Ea" src="http://domain.com/images3.jpg" /> 
    [5] => <img
    [6] =>  alt="amj" src="https://domain.com/images7.jpg" /> 
    [7] => <img
    [8] =>  alt="amj" src="https://domain.com/images7.jpg" />
)

正如您所看到的，preg_split执行了记录的作业，并且每次捕获第一个捕获supgroup时都会添加另一个分割（它只会占用第一个）。然后，您可能需要在完整标记中扩展它，例如，已经在不同的其他类似html的字符串正则表达式问题中进行了概述（与正则表达式一样有限，因此责怪您使用preg_ *函数而不是HTML解析器如果遇到问题，而不是模式本身：

/(<img [^>]*>)/

结果（Demo）：

Array
(
    [0] => <p>First sentence here comes.&nbsp; Second sentence here it is.&nbsp; One more sentence.&nbsp;&nbsp;</p> 
    [1] => <img alt="amj" src="https://domain.com/images7.jpg" />
    [2] =>  
    [3] => <img alt="Ea" src="http://domain.com/images3.jpg" />
    [4] =>  
    [5] => <img alt="amj" src="https://domain.com/images7.jpg" />
    [6] =>  
    [7] => <img alt="amj" src="https://domain.com/images7.jpg" />
    [8] => 
)

使用标准HTML解析器可以使代码更稳定。

爆炸字符串获得第一句第一个图像

3 个答案: