在php中从字符串中提取文本

时间:2013-10-24 16:00:49

标签: php regex preg-match

我想在以下文字<p><b><div id="t" class="t">之间提取任何文字或字符串。这是我的样本无效

$st = '<p><b>Auburn</b> is a city in <a href="/my/id/ala" title="auburn">Lee County</a>, <a href="/my/Alabama" title="Alabama">Alabama</a>, <a href="/my/ph" title="PH">United States</a>. It is the largest city in eastern Alabama with a 2012 population of 56,908.<sup id="test" class="test"><a href="#tst"><span>[</span>2<span>]</span></a></sup> It is a principal city of the <a href="/my/tst" title="Auburn-Opelika Metropolitan Area" class="cs">Auburn-Opelika Metropolitan Area</a>. The <a href="/my/st" title="Auburn-Opelika, AL MSA" class="vf">Auburn-Opelika, AL MSA</a> with a population of 140,247, along with the <a href="/myu/sc" title="Columbus, GA-AL MSA" class="Xd">Columbus, GA-AL MSA</a> and <a href="/my/fd" title="Tuskegee, Alabama">Tuskegee, Alabama</a>, comprises the greater <a href="/my/cdA" title="Columbus-Auburn-Opelika, GA-AL CSA" class="se">Columbus-Auburn-Opelika, GA-AL CSA</a>, a region home to 456,564 residents.</p>
<p>Auburn is a <a href="/my/te" title="College town">college town</a> and is the home of <a href="/my/As" title="Auburn University">Auburn University</a>. Auburn has been marked in recent years by rapid growth, and is currently the fastest growing metropolitan area in Alabama and the nineteenth-fastest growing metro area in the United States since 1990.<sup class="fd" style="white-space:nowrap;">[<i><a href="/my/d" title="fda"><span title="fad (August 2011)">citation needed</span></a></i>]</sup> U.S. News ranked Auburn among its top ten list of best places to live in United States for the year 2009.<sup id="d3" class="f"><a href="3"><span>[</span>3<span>]</span></a></sup> The city`s unofficial nickname is “The Loveliest Village On The Plains,” taken from a line in the poem <i><a href="/my/da" title="The Deserted Village">The Deserted Village</a></i> by <a href="/my/fs" title="Oliver Goldsmith">Oliver Goldsmith</a>: “Sweet Auburn! loveliest village of the plain...”<sup id="ds" class="dsa"><a href="dd"><span>[</span>4<span>]</span></a></sup></p>
<div id="t" class="t">';

preg_match_all('/<p><b>(.*?)<div id="t" class="t">/U', $st, $output);
$result = $output[0];
print_r($output);
echo $result;

2 个答案:

答案 0 :(得分:1)

这里不需要正则表达式,因为我们正在使用文字字符串。只需使用偏移量strpos

<?php
    function str_between($string, $searchStart, $searchEnd, $offset = 0) {
        $startPosition = strpos($string, $searchStart, $offset);
        if ($startPosition !== false) {
            $searchStartLength = strlen($searchStart);
            $endPosition = strpos($string, $searchEnd, $startPosition + 1);
            if ($endPosition !== false) {
                return substr($string, $startPosition + $searchStartLength, $endPosition - $searchStartLength);
            }
            return substr($string, $startPosition + $searchStartLength);
        }
        return $string;
    }

    var_dump(str_between($st, '<p><b>', '<div id="t" class="t">'));
?>

DEMO

答案 1 :(得分:0)

如果您仍想使用它而不是h2ooooooo的答案,稍作修改将有助于您的正则表达式:

“/ s”告诉正则表达式继续搜索换行符。您的$ st包含换行正在停止正则表达式引擎的位置。

使用以下内容:

preg_match_all('/<p><b>(.*?)<div id="t" class="t">/sU', $st, $output);