Question

我有字符串（不是html标签）

例如：

<div class="lh">
    <b>Review: <b>Sarah Geronimo</b> leaves UAE fans asking for more</b></a><br>
    <font size="-1"><b><font color="#6f6f6f">gulfnews.com</font></b></font><br>
    <font size="-1"><b>Geronimo</b> how to get this contents <b>...</b></font><br>
    <font size="-1" class="p"></font><br><font class="p" size="-1"><nobr><b>and more&nbsp;»</b></nobr></a></font>
</div>

我希望获得字符串＆＃34;如何获取此内容＆＃34; 。我怎么用正则表达式做到这一点？

更新

在这种情况下为什么我要使用正则表达式？因为结果它包含html标签

我正在解析此网站https://news.google.com/news/feeds?q=sarah%20geronimo%20&output=rss

这是我的所有代码：

$html = new simple_html_dom();
$html = file_get_html("https://news.google.com/news/feeds?q=sarah%20geronimo%20&output=rss");

foreach($html->find('item') as $item) {
    $items['desc'] = $item->find('description',0)->plaintext;   
    $data[] = $items;
}

$regex = '~(?s)<div[^>]*>(?:.*?<font size){2}[^>]*><b>.*?</b>\K[^<]+~';

foreach($data as $content) {
        $desc = $content['desc'];
        preg_match($regex,$desc ,$m);
        echo $m[0];
}

我希望获得标记<description>中的内容。其中包含html标签的内容和我使用正则表达式删除它

但它的返回空白？

Answer 1

这个正则表达式：

(?s)<div[^>]*>(?:.*?<font size){2}[^>]*><b>.*?</b>\K[^<]+

如何使用

$str = '<div class="lh">
    <b>Review: <b>Sarah Geronimo</b> leaves UAE fans asking for more</b></a><br>
    <font size="-1"><b><font color="#6f6f6f">gulfnews.com</font></b></font><br>
    <font size="-1"><b>Geronimo</b> how to get this contents <b>...</b></font><br>
    <font size="-1" class="p"></font><br><font class="p" size="-1"><nobr><b>and more&nbsp;»</b></nobr></a></font>
</div>';

$regex = '~(?s)<div[^>]*>(?:.*?<font size){2}[^>]*><b>.*?</b>\K[^<]+~';

if(preg_match($regex,$str,$m)) {
    echo $m[0]."<br />";
}

<强>输出：

how to get this contents

如果您有任何疑问，请与我们联系。：）

如何使用正则表达式获取此字符串（html标记）？

1 个答案: