正则表达式检查单词行并打印整行

时间:2015-07-02 10:03:29

标签: php regex preg-match

我有这个

$str= '</b><b>Tech Fax:<br/>
</b><b>Tech Fax Ext:<br/>
</b><b>Tech Email: </b><a href="mailto:rsurikov@gmail.com">rsurikov@gmail.com</a><br/>
<b>Name Server: </b><a href="/index.php?query=69.93.127.10&amp;output=nice">ns1.linode.com</a><br/>
<b>Name Server: </b><a href="/index.php?query=65.19.178.10&amp;output=nice">ns2.linode.com</a><br/>
<b>Name Server: </b><a href="/index.php?query=75.127.96.10&amp;output=nice">ns3.linode.com</a><br/>
<b>Name Server: </b><a href="/index.php?query=207.192.70.10&amp;output=nice">ns4.linode.com</a><br/>
<b>Name Server: </b><a href="/index.php?query=109.74.194.10&amp;output=nice">ns5.linode.com</a><br/>
<b>DNSSEC:</b>Unsigned<br/>
<b>Registrar Abuse Contact Email: </b><a href="mailto:abuse-contact@publicdomainregistry.com">abuse-contact@publicdomainregistry.com</a><br/>
<b>Registrar Abuse Contact Phone: </b>+1-2013775952<br/>
<b>URL of the ICANN WHOIS Data Problem Reporting System:<br/>
</b><a href="http://wdprs.internic.net" target="_blank">http://wdprs.internic.net</a>/<br/>
>>>Last update of WHOIS database: 2015-07-01T16:22:28+0000Z<br />
</td><td bgcolor="#C0C0C0" width="53" rowspan="2">
&nbsp;</td></tr>
<tr align="left" valign="top"><td bgcolor="#C0C0C0" width="639">
&nbsp;</td></tr>
</table><br />
<form name="queryform" method="post" action="/index.php">
<table cellpadding="6" cellspacing="0" border="0" width="540" dir="ltr">
<tr><td bgcolor="#C0C0C0">
<table width="100%" cellpadding="0" cellspacing="0" border="0" dir="ltr">
   <tr class="upperrow">
      <td align="left" valign="top" nowrap="nowrap"><font face="Arial" size="+0"><b>Enter any domain name:</b></font></td>
   </tr>
   <tr class="middlerow">
      <td align="center" valign="middle" nowrap="nowrap">
      <input type="text" name="query" value="" class="queryinput" size="20" />&nbsp;<input type="submit" name="submit" value="Check Domain" /></td>
   </tr>
   <tr class="lowerrow">
      <td align="right" valign="bottom"></td>
   </tr>
</table>'

我需要PHP中的正则表达式来检查名称服务器的行:然后为我保存整行。 我需要$ match为:

 <b>Name Server: </b><a href="/index.php?query=69.93.127.10&amp;output=nice">ns1.linode.com</a><br/>
    <b>Name Server: </b><a href="/index.php?query=65.19.178.10&amp;output=nice">ns2.linode.com</a><br/>
    <b>Name Server: </b><a href="/index.php?query=75.127.96.10&amp;output=nice">ns3.linode.com</a><br/>
    <b>Name Server: </b><a href="/index.php?query=207.192.70.10&amp;output=nice">ns4.linode.com</a><br/>
    <b>Name Server: </b><a href="/index.php?query=109.74.194.10&amp;output=nice">ns5.linode.com</a><br/>

也不总是包含4行&#34;名称服务器:&#34;在$ str中,有时是两行,有时是5,这就是我写的正则表达式的问题,这里是:

/Name Server[^:]*:\s*(.*)\s(.*)/i

3 个答案:

答案 0 :(得分:0)

您可以将DOMDocument与DOMXPath结合使用:

$dom = new DOMDocument;
@$dom->loadHTML($str);

$xp = new DOMXPath($dom);
$links = $xp->query('//b[text()="Name Server: "]/following-sibling::a[1]');

foreach ($links as $link) {
    echo $link->nodeValue . PHP_EOL;
}

xpath查询意味着:

//                        # anywhere in the DOM tree
b                         # a b tag
[text()="Name Server: "]  # condition: the text content must be "Name Server: "
/following-sibling::a[1]  # the first following "a" tag

答案 1 :(得分:-1)

您必须使用preg_match_all功能。以下面的简短脚本为例:

Douglas-Peucker

将输出

<?php

$a = "abc\ndef\naaa\naba\nxyz";
$matches = array();
preg_match_all("/a.*/", $a, $matches);
print_r($matches);
?>

答案 2 :(得分:-1)

一般来说,尝试使用正则表达式搜索/解析html是一个坏主意。但是,如果你坚持并且确定html与你上面发布的内容差异很大,你可以这样做:

/^(?:<b>Name Server: <\/b><a href="\/index.php\?query=\d{1,3}\.\d{1,3}.\d{1,3}\.\d{1,3}\&amp;output=nice">\w+\.\w+\.\w+<\/a><br\/>.)+^/sm

您可以在此处查看其工作原理:https://regex101.com/r/dU6gH4/1