Question

我得到的html内容需要使用preg match all在超链接标记内提取值。我尝试了以下但我没有得到任何数据。我包括了一个示例输入数据。你能帮我解决这个问题并在play.asp前打印所有值吗？ID =（例如：我想从play.asp获取此值 12345 ？ID = 12345 < / strong>）？

示例输入html数据：

<A HREF="http://www.somesite.com/play.asp?ID=12345&Selected_ID=&PhaseID=123" class="space"><span id="Img_1"></span></A></TD>

和代码

$regexp = "<A\s[^>]*HREF=\"play.asp(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/A>"; if(preg_match_all("/$regexp/siU", $input, $matches)) { $url=str_replace('?ID=', '', $matches[2]); $url2=str_replace('&Selected_ID=&PhaseID=123', '', $url); print_r($url2); }

Answer 1

$str = '<A HREF="http://www.somesite.com/play.asp?ID=12345&Selected_ID=&PhaseID=123" class="space"><span id="Img_1"></span></A>';

preg_match_all( '/<\s*A[^>]HREF="(.*?)"\s?(.*?)>/i', $str, $match);
print_r( $match );

试试这个。

Answer 2

不要！正则表达式是一种（坏）文本处理方式。这不是文本，而是HTML源代码。处理它的工具称为HTML解析器。虽然PHP的DOMDocument也能够加载HTML，但在一些罕见的情况下可能会出现问题。一个构建不良的正则表达式（你认为还有其他的错误）会对页面中的任何变化产生影响。

Answer 3

这不够吗？

/<a href="(.*?)?"/I

编辑：

这似乎有效：

'/<a href="(.*?)\?/i'

Answer 4

这应该达到预期的效果。它是HTML解析器和内容提取功能的组合：

function extractContents($string, $start, $end)
{
    $pos = stripos($string, $start);
    $str = substr($string, $pos);
    $str_two = substr($str, strlen($start));
    $second_pos = stripos($str_two, $end);
    $str_three = substr($str_two, 0, $second_pos);
    $extractedContents = trim($str_three);
    return $extractedContents;
}

include('simple_html_dom.php');
$html = file_get_html('http://siteyouwantlinksfrom.com');
$links = $html->find('a');
foreach($links as $link)
{
    $playIDs[] = extractContents($link->href, 'play.asp?ID=', '&');
}

print_r($playIDs);

您可以从here

下载simple_html_dom.php

Answer 5

您不应使用正则表达式来解析HTML。
这是 DOMDocument ：

的解决方案

<?php
    $input = '<A HREF="http://www.somesite.com/play.asp?ID=12345&Selected_ID=&PhaseID=123" class="space"><span id="Img_1"></span></A>';
    // Clean "&" element in href
    $cleanInput = str_replace('&','&amp;',$input);
    // Load HTML

    $domDocument = new DOMDocument();
    $domDocument->loadHTML($cleanInput);

    // Retrieve <a /> tags
    $aTags = $domDocument->getElementsByTagName('a');
    foreach($aTags as $aTag)
    {   

        $href = $aTagA->getAttribute('href');
        $url  =  parse_url($href);
        $vars = array();
        parse_str($url['query'], $vars);

        var_dump($vars);
    }
?>

输出：

array (size=3)
  'ID' => string '12345' (length=5)
  'Selected_ID' => string '' (length=0)
  'PhaseID' => string '123' (length=3)

如何在内部获得价值

5 个答案: