Question

我正在努力学习PHP。

我创建了一个数组并用一些卷曲返回数据填充了几个位置。

我不知道如何搜索的每个数组位置，并将其中的每个字符都返回。

从终端我可能会做这样的事情：

grep -A 2 strong | sed -e 's/<p><strong>//' -e 's/<\/strong><br\/>//' -e 's/<br \/>//' -e 's/<\/p>//' -e 's/--//' -e 's/^[ \t]*//;s/[ \t]*$//'

但我在PHP中失去了这个

任何建议？

修改：我想要

的每个的内容

编辑2：这是我正在尝试的代码：

    $m=array();
preg_match_all('/<p><strong>(.*?)<\/p>/',$buffer,$m);
$sizeM = count($m);

for ( $counter2 = 0; $counter2 <= $sizeM; $counter2++)
{
    $displayString.= $m[$counter2];
}

获取ArrayArrayArray ...作为我的$ displayString

编辑3：我这样做：

$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL, $url);
curl_setopt($curl_handle, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110303 Ubuntu/10.04 (lucid) Firefox/3.6.15");
curl_setopt($curl_handle, CURLOPT_HEADER, 0);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);

$buffer = curl_exec($curl_handle);

curl_close($curl_handle);

$m=array();
preg_match_all('/<p>.*?<strong>(.*?)<\/p>/i',$buffer,$m);

foreach($m[1] as $mnum=>$match) {
    $displayString.='Match '.$mnum.' is: '.$match."\n";
}

Answer 1

在PHP和许多其他语言中，它首选不使用字符串函数或正则表达式来匹配HTML，因为HTML不是常规的，它可能会变成真正的错误。

您应该关注的是一个DOM系统，您可以将html作为Object进行迭代，就像JavaScript访问DOM一样。

您应该查看以下Native PHP Library以开始使用：http://php.net/manual/en/class.domdocument.php

您可以这样使用：

$xml = new DOMDocument();

// Load the url's contents into the DOM 
$xml->loadHTMLFile($url); 

//Loop through each <a> tag in the dom and add it to the link array 
foreach($xml->getElementsByTagName('a') as $link)
{
    echo $link->href . "\n";
}

这将找到文档中的所有链接。

另请参阅我创建的帖子和Gordon的好答案：How do you parse and process HTML/XML in PHP?

Answer 2

preg_match_all()

$m=array();
preg_match_all('/<p>\s*<strong>([\s\S]*?)<\/p>/i',$string,$m);
foreach($m[1] as $mnum=>$match){
    $displayString.='Match '.$mnum.' is: '.$match."\n";
}

$m现在包含所有匹配项。 $m[0]举行整场比赛， $m[1]支持括号内的比赛

Answer 3

正如其他帖子所指出的，如果您正在尝试处理HTML，则不应使用正则表达式。

要处理查找，您可以使用DOMDocument：

$doc = new DOMDocument();
$doc->loadHTML($html);
$pTags = $doc->getElemetsByTagName('p');
for ($pTags as $pTag) {
  if ($pTag->firstChild->nodeName === 'strong') {
    $data = $pTag->firstChild->nodeValue;
  }
}

或使用XPath：

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$matchingNodes = $xpath->query('//p/strong');

或者你甚至可以使用外籍人士。

与使用正则表达式相比，这些方法更清晰，更有效，更灵活，更安全。

我个人最喜欢从xml风格的文档中提取数据是xpath。这是一组很好的xpath示例：http://msdn.microsoft.com/en-us/library/ms256086.aspx

编辑： *注意：如果您尝试处理非常大的XML / HTML文档，则不希望使用DOMDocument或XPath，因为它们对于大型文档来说可能很慢。对于这些情况，请使用事件驱动的XML解析器。我们有一些工作案例，用XPath解析一个大的XML文件花了几分钟，用事件驱动的解析器解析同一个文件只花了几秒钟。

Answer 4

正则表达式将成为你的朋友。 strpos，substr和explode是有用的PHP函数。

Answer 5

好吧，如果这些位置与您期望的结果无关，您可以尝试将数组合并为单个字符串，并在那里执行正则表达式...

这是代码

    <?php

$data = array(
    'DONT MATCH THISDONT MATCH THIS<p><strong>hello1!</strong></p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
    'DONT MATCH THISDONT MATCH THIS<p><strong>hello2!</strong></p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
    'DONT MATCH THISDONT MATCH THIS<p><strong>hello3!</strong></p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
    '<p><strong>hello4!</strong></p>DONT MATCH THISDONT MATCH THIS<p><strong>hello5!</strong> test test</p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
    'DONT MATCH THISDONT MATCH THIS<p><strong>hello6!</strong></p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
);

preg_match_all('/<p><strong>.*?<\/p>/',implode($data,''),$results);

print_r($results);


?>

如果这对您有用，请告诉我。干杯！

搜索PHP字符串

5 个答案: