Question

I have a string

$str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor <a href="http://example2.com">Do not want this text</a> incididunt ut labore et <a href="http://example.com">Want this text</a> dolore magna aliqua. Ut enim ad     minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in <a href="http://example.com">Do not want this text</a> reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

How can I extract the text between the first instance of an tag that links to http://example.com之间提取文字？我不希望链接到http://example2.com的文字或链接到http://example.com的第二个链接中的文字。

我想要返回'想要这个文字'。知道怎么做吗？

谢谢！

Answer 1

您最有可能使用DOMDocument - 与DOMXPath一起实现您的目标，以满足更复杂的要求。

$dom=new DOMDocument;
$dom->loadHTML( $str );

$col=$dom->getElementsByTagName('a');
if( !empty( $col ) ){
    foreach( $col as $node )echo $node->nodeValue;
}

Answer 2

您需要使用DomDocument。 DomDocument允许您使用PHP通过文档对象模型与HTML页面进行交互。

$dom = new DomDocument;
$dom->loadHTML(file_get_contents($url));
$dom->preserveWhiteSpace = false; //remove unnecessary whitespace
$links = $dom->getElementsByTagName('a');

此时，您有一个对象数组。实质上，每个对象都是带有ElementNode标记的a。

假设您要检索第一个链接的文本，那么您可以执行以下操作： $text = $links[0]->nodeValue;

但是，如果您想要匹配链接“http://example.com”的文字，那么您可以这样做：

foreach ($links as $link)
{
  if($link->attributes->href == "http://example.com") {
  $text = $link->nodeValue;
}

Answer 3

您可以使用正则表达式执行此操作，例如：

\<a href=\"http:\/\/example.com\".*\>(.*?)\<\/a\>

代码段：

$str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor <a href="http://example2.com">Do not want this text</a> incididunt ut labore et <a href="http://example.com">Want this text</a> dolore magna aliqua. Ut enim ad     minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in <a href="http://example.com">Do not want this text</a> reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

$regex = '/\<a href=\"http:\/\/example.com\".*\>(.*?)\<\/a\>/g';
preg_match($regex, $str, $matches);

在$ match中你会找到你想要的输出。

Answer 4

使用preg_match()

示例：

$string = '<a href="http://example2.com">Do not want this text</a> incididunt ut labore et <a href="http://example.com">Want this text</a> '; 

if ( preg_match('/<\s*a[^<>]*>([^<>]+)</a>/i', $string, $matches) ) {
       var_dump($matches); 
}

在第一个<a> tag

4 个答案: