Question

im trying to get the text from this links

The full tag

<a href="/wiki/Correa_(apellido)" title="Correa (apellido)">Correa</a>

My code

$html = file_get_contents("https://es.wikipedia.org/wiki/Anexo:Apellidos_m%C3%A1s_comunes_en_Espa%C3%B1a_e_Hispanoam%C3%A9rica");

preg_match_all('%<a href="/wiki/.*?_(apellido)" title=".*? (apellido)">(.*?)</a>%i', $html, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    echo $result[1][$i];
}

But is not working, what im doing wrong ?

Answer 1

暂时可以节省您的费用：

preg_match_all('/<a.*>(.*)<\/a>/imU', $html, $matches);

$matches[1]将包含所有文字值（如果我理解您的请求正确）。但正如Barmar建议的那样，你真的应该考虑使用DOMDocument解析函数。

修改：如果您只查找那些包含＆＃34; apellido＆＃34;在其＆＃34; href＆＃34;或者＆＃34; title＆＃34;，您必须按如下方式修改它：

preg_match_all('/<a.*apellido.*".*>(.*)<\/a>/imU', $html, $matches);

请注意，如果＆＃34; apellido＆＃34;这会给你一个误报。发生在＆＃34; title＆＃34;以外的其他a属性中。或＆＃34; href＆＃34;。

使用preg_match_all过滤<a> TAG text

1 个答案: