我正在尝试使用PHP从此网站http://www.internic.net/registrars/registrar-967.html获取联系信息。我可以通过使用href链接获取电子邮件广告:
$contactStr = "http://www.internic.net/registrars/registrar-967.html";
$contact_string = file_get_contents("$contactStr");
preg_match_all('/<a href="(.*)">(.*)<\/a>/i', $contact_string, $contactInfo);
$email = str_replace("mailto:", "", $contactInfo[1][6]);
但是,我很难获得地址和手机#,因为没有我可以使用的html元素&lt; p>也许..我只需要1800 SW First Ave.,Suite 440 Portland OR 97201 United States和310-467-2549 from this site ..请赐教我如何做到这一点 使用preg_match_all或其他一些方法..谢谢!
答案 0 :(得分:0)
正如其他人在评论中所说的那样,而不是使用正则表达式尝试DOMDocument。
这是一个例子(有点hacky tho)希望它有所帮助:
function get_register_by_id($id){
$site = file_get_contents('http://www.internic.net/registrars/registrar-'.$id.'.html');
$dom = new DOMDocument();
@$dom->loadHTML($site);
$result = array();
foreach($dom->getElementsByTagName('td') as $td) {
if($td->getAttribute('width')=='420'){
$innerHTML= '';
$children = $td->childNodes;
foreach ($children as $child) {
$innerHTML .= trim($child->ownerDocument->saveXML($child));
}
$fixed = array_map('strip_tags', array_map('trim', explode("<br/>",trim($innerHTML))));
foreach($fixed as $val){
if(empty($val)){continue;}
$result[] = str_replace(array('! '),'',$val);
}
}
}
return $result;
}
print_r(get_register_by_id(965));
/*Array
(
[0] => Domain Central Australia Pty Ltd.
[1] => Level 27
[2] => 101 Collins Street
[3] => Melbourne Victoria 3000
[4] => Australia
[5] => +64 300 4192
[6] => robert.rolls@domaincentral.com.au
)*/
print_r(get_register_by_id(966));
/*
Array
(
[0] => Web Business, LLC
[1] => PO Box 1417
[2] => Golden CO 80402
[3] => United States
[4] => +1.303.524.3469
[5] => support@webbusiness.biz
)*/
print_r(get_register_by_id(967));
/*
Array
(
[0] => #1 Host Australia, Inc.
[1] => 1800 SW First Ave., Suite 440
[2] => Portland OR 97201
[3] => United States
[4] => 310-467-2549
[5] => registry-operations@moniker.com
)*/