Question

我需要从这个字符串中提取第二个URL：

$string = '<td class="table_td">   submitted by   <a href="https://www.example.com/account/user" target="_blank" rel="nofollow"> account </a> <br>
 <a href="https://www.URL-I-NEED.com/BKHHZu_A4lu" target="_blank" rel="nofollow">[site]</a>   <a href="https://www.example.com/settings/user/" target="_blank" rel="nofollow">[settings]</a></td>';

我尝试了this solution，并尝试了以下设置：

$startTag = ' <a href="';
$endTag = '" target';

但是它返回了第一个URL，而不是我需要的那个，因为这些标签也出现在我需要的子字符串之前。

我尝试在换行符之前将<br>添加到$startTag，但它没有返回任何字符串。

基本上，我需要$startTag需要{newline} <a href="，但我无法弄清楚如何包含该换行符。

或者我可能错误地思考这个问题，只需从该字符串中提取所有网址，然后只选择第二个网址，就可以采用更简单的方法。

无论哪种方式，如何在上面的字符串中提取第二个URL？

Answer 1

您可以使用DOM parser作为此代码：

$string = '<td class="table_td">   submitted by
<a href="https://www.example.com/account/user" target="_blank" rel="nofollow"> account </a> <br>
<a href="https://www.URL-I-NEED.com/BKHHZu_A4lu" target="_blank" rel="nofollow">[site]</a>
<a href="https://www.example.com/settings/user/" target="_blank" rel="nofollow">[settings]</a>
</td>';

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($string); // loads your html
$xpath = new DOMXPath($doc);

// query all <a...> elements
$nodelist = $xpath->query("//a");

// get 2nd element from the list
$node = $nodelist->item(1);

// extract href attribute
$link = $node->getAttribute('href');

echo $link . "\n";
//=> https://www.URL-I-NEED.com/BKHHZu_A4lu

Code Demo

当文本重复并包含一个新行时，如何在文本之间获取子串？

1 个答案: