下面是我的Regexp
,当我直接为其分配html内容时效果很好。但不使用file_get_contents()
<?php
$url = "http://www.apdepot.com/Products/SearchResults.aspx?type=keyword&keyword=6-918873";
$urlcontent = file_get_contents($url);
/* It works when I assign html content to it but now working with file_get_contents().
$urlcontent = '<td width="80%" valign="top" align="left">
<span id="ContentPlaceHolder1_Repeater1_lblLongDesc_0">*WAS W10224675 M BASKT-WARE WAS W10171734</span> <input type="hidden" value="*WAS W10224675 M BASKT-WARE WAS W10171734" id="ContentPlaceHolder1_Repeater1_hdnP21Desc_0" name="ctl00$ContentPlaceHolder1$Repeater1$ctl01$hdnP21Desc">
</td>'; */
preg_match_all('/<span.*id=\"ContentPlaceHolder1_Repeater1_lblLongDesc_0\".*>(.*?)<\/span>/Us', $urlcontent, $name);
print_r($name);
预期产出 -
Array
(
[0] => Array
(
[0] => <span id="ContentPlaceHolder1_Repeater1_lblLongDesc_0">*WAS W10224675 M BASKT-WARE WAS W10171734</span>
)
[1] => Array
(
[0] => *WAS W10224675 M BASKT-WARE WAS W10171734
)
)
$url = "http://www.apdepot.com/Products/SearchResults.aspx?type=keyword&keyword=6-918873";
$urlcontent = file_get_contents($url);
$name = '<td valign="top" align="left" class="SearchResultItemHeader">
<a class="thickbox" title="Dishwasher Tube/Spray Arm Kit" href="ItemDetailsPopup.aspx?itemcode=WHI%20675808&keepThis=true&TB_iframe=true&height=500&width=640"><b>Dishwasher Tube/Spray Arm Kit</b></a>
</td>';
preg_match_all('/<a.*class=\"thickbox\".*title=\"(.*?)\".*href=\"ItemDetailsPopup.aspx\?itemcode.*\">.*<b>(.*)<\/b><\/a>/s', $name, $nameoutput);
print_r($nameoutput);
预期产出 -
标签中的文字
Dishwasher Tube/Spray Arm Kit
答案 0 :(得分:2)
尝试:
preg_match_all('/<span id=\"ContentPlaceHolder1_Repeater1_lblLongDesc_0\".*>(.*)<\/span>/Us', $urlcontent, $name);
输出:
Array
(
[0] => Array
(
[0] => <span id="ContentPlaceHolder1_Repeater1_lblLongDesc_0">*WAS W10224675 M BASKT-WARE WAS W10171734</span>
)
[1] => Array
(
[0] => *WAS W10224675 M BASKT-WARE WAS W10171734
)
)
对于数据报废,xpath是最佳选择。看看下面的例子:
$url = "http://www.apdepot.com/Products/SearchResults.aspx?type=keyword&keyword=6-918873";
$urlcontent = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($urlcontent);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//span[@id='ContentPlaceHolder1_Repeater1_lblLongDesc_0']")->item(0)->nodeValue;
echo $elements;
//output: *WAS W10224675 M BASKT-WARE WAS W10171734
有关详细信息,请查看http://php.net/manual/en/class.domdocument.php和http://php.net/manual/en/class.domxpath.php
anchor
代码和b
代码的示例:
$urlcontent = '<td valign="top" align="left" class="SearchResultItemHeader">
<a class="thickbox" title="Dishwasher Tube/Spray Arm Kit" href="ItemDetailsPopup.aspx?itemcode=WHI%20675808&keepThis=true&TB_iframe=true&height=500&width=640"><b>Dishwasher Tube/Spray Arm Kit</b></a>
</td>';
$doc = new DOMDocument();
$doc->loadHTML($urlcontent);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//td[@class='SearchResultItemHeader']/a/b")->item(0)->nodeValue;
echo $elements;
////output: Dishwasher Tube/Spray Arm Kit
答案 1 :(得分:0)
像这样改变Regexp -
preg_match_all('%<span.*id=\"ContentPlaceHolder1_Repeater1_lblLongDesc_0\"(.*)\/span>%', $urlcontent, $desc);
然后你可以应用下面的strip_tags()
$description = strip_tags($desc[1][0]);
输出 -
Array
(
[0] => Array
(
[0] => <span id="ContentPlaceHolder1_Repeater1_lblLongDesc_0">*WAS W10224675 M BASKT-WARE WAS W10171734</span>
)
[1] => Array
(
[0] => *WAS W10224675 M BASKT-WARE WAS W10171734
)
)