我正在寻找一种方法,用PHP将HTML表行存储到一个数组中,每个列值都是一个独特的数组值。
首先,我有一个完整的HTML页面,我从curl函数获得。在此页面中,我有一个具有特定ID(example_table
)的表。
如何选择此表,然后将每个表值放入2坐标数组?
<table id="example_table">
<tr><td>A1</td><td>B1</td><td>C1</td><td>D1</td></tr>
<tr><td>A2</td><td>B2</td><td>C2</td><td>D2</td></tr>
<tr><td>A3</td><td>B3</td><td>C3</td><td>D3</td></tr>
</table>
生成的数组如下:
array_example[2][3] = D3
//编辑:
我从curl获得的HTML代码如下:
<table style="width: 95%; border-collapse: collapse" id="itemDetails">
<tbody>
<tr>
<td class="photo" style="width: 150px; text-align: center; padding: 16px 0 10px 0; vertical-align: top; font-size: 12px; line-height: 18px; font-family: Arial, sans-serif"> <a href="https://www.amazon.fr/gp/r.html?C=11II60L0IUDTQ&K=A37E83YVOBN2AM&R=JC53DV4YW1VB&T=C&U=http%3A%2F%2Fwww.amazon.fr%2Fdp%2FB003629R5S%2Fref%3Dpe_386181_40444391_TE_item_image&A=UOK26PXWANT3G9FAME6Z7XWZJVWA&H=6B71WXRFQA1P9GFWS8UJRWK0VRAA&ref_=pe_386181_40444391_TE_item_image" title="B003629R5S" style="text-decoration: none; color: rgb(0, 102, 153); font: 12px/ 16px Arial, sans-serif"> <img id="asin" src="http://ecx.images-amazon.com/images/I/31FSVzCchgL._SCLZZZZZZZ__SY115_SX115_.jpg" style="border: 0"> </a> </td>
<td class="name" style="color: rgb(102, 102, 102); padding: 10px 0 0 0; vertical-align: top; font-size: 12px; line-height: 18px; font-family: Arial, sans-serif"> <a href="https://www.amazon.fr/gp/r.html?C=11II60L0IUDTQ&K=A37E83YVOBN2AM&R=JC53DV4YW1VB&T=C&U=http%3A%2F%2Fwww.amazon.fr%2Fdp%2FB003629R5S%2Fref%3Dpe_386181_40444391_TE_item&A=GNBXWEPQKFU3GEGJBGMMWYKA3K4A&H=RXNWUWDFVKS3LQE1FENOQS4VDXCA&ref_=pe_386181_40444391_TE_item" style="text-decoration: none; color: rgb(0, 102, 153); font: 12px/ 16px Arial, sans-serif"> Brabantia Lot de 12 rouleaux de 10 sacs poubelle Type L 45 l </a> <br> Etat : Neuf <br> Vendu par <a href="https://www.amazon.fr/gp/r.html?C=11II60L0IUDTQ&K=A37E83YVOBN2AM&R=JC53DV4YW1VB&T=C&U=http%3A%2F%2Fwww.amazon.fr%2Fgp%2Fhelp%2Fseller%2Fhome.html%2Fref%3Dpe_386181_40444391_TE_seller%3Fie%3DUTF8%26seller%3DA2ANA7NET4TQ0F&A=AJJRA9DQK9EDVNDQDNAULH4KOC4A&H=XH19ITMSWA3KJ0PSBTHLNQAFYAAA&ref_=pe_386181_40444391_TE_seller" style="text-decoration: none; color: rgb(0, 102, 153); font: 12px/ 16px Arial, sans-serif">Perfect Groceries</a> <br> <a href="https://www.amazon.fr/gp/r.html?C=11II60L0IUDTQ&K=A37E83YVOBN2AM&R=JC53DV4YW1VB&T=C&U=http%3A%2F%2Fwww.amazon.fr%2Fexpedieparamazon%3Fref_%3Dpe_386181_40444391_TE_helpfba&A=KEYAA7VCZNWVKEA7P2LYC49LKQMA&H=W03OAAPQITJM5WD6MC5LG21OLVIA&ref_=pe_386181_40444391_TE_helpfba" style="text-decoration: none; color: rgb(0, 102, 153); font: 12px/ 16px Arial, sans-serif">Expédié par Amazon</a> <br>
<div style="vertical-align: top; align=center;">
<table border="0" cellspacing="4" cellpadding="0" style="border-collapse: separate">
<tbody style="vertical-align: bottom;">
<tr>
<td style="vertical-align: top; font-size: 12px; line-height: 18px; font-family: Arial, sans-serif"> </td>
<td style="vertical-align: top; font-size: 12px; line-height: 18px; font-family: Arial, sans-serif"> <a href="https://www.amazon.fr/gp/r.html?C=11II60L0IUDTQ&K=A37E83YVOBN2AM&R=JC53DV4YW1VB&T=C&U=http%3A%2F%2Fwww.amazon.fr%3A80%2Fgp%2Fredirect.html%2Fref%3Dpe_386181_40444391_cm_sw_cl_fa_doce%2F280-1861239-2544346%3F_encoding%3DUTF8%26location%3Dhttp%253A%252F%252Fwww.facebook.com%252Fdialog%252Ffeed%253Fapp_id%253D164734381262%2526caption%253D%2526display%253Dpopup%2526link%253Dhttp%25253A%25252F%25252Fwww.amazon.fr%25252Fdp%25252FB003629R5S%25252Fref%25253Dcm_sw_r_fa_doce%2526name%253D%2526picture%253Dhttp%25253A%25252F%25252Fecx.images-amazon.com%25252Fimages%25252FI%25252F31FSVzCchgL._SCLZZZZZZZ__SY115_SX115_.jpg%2526redirect_uri%253Dhttp%25253A%25252F%25252Fwww.amazon.fr%25252Fdp%25252FB003629R5S%25252Fref%25253Dcm_sw_r_fa_doce%26source%3Dstandards%26token%3D6BD0FB927CC51E76FF446584B1040F70EA7E88E1&A=O66YJALVI4AECB8UEEBF4NGUHQQA&H=PAUAVYQX28VPMP9DQELUI7PJWJWA&ref_=pe_386181_40444391_cm_sw_cl_fa_doce" title="Facebook" style="text-decoration: none; color: rgb(0, 102, 153); font: 12px/ 16px Arial, sans-serif"> <img src="http://g-ecx.images-amazon.com/images/G/08/x-locale/personalization/live-meter/facebook._V15055984_.gif" width="16" alt="Facebook" style="vertical-align: middle; border: 0" height="16" border="0"> </a> </td>
<td style="vertical-align: top; font-size: 12px; line-height: 18px; font-family: Arial, sans-serif"> <a href="https://www.amazon.fr/gp/r.html?C=11II60L0IUDTQ&K=A37E83YVOBN2AM&R=JC53DV4YW1VB&T=C&U=http%3A%2F%2Fwww.amazon.fr%3A80%2Fgp%2Fredirect.html%2Fref%3Dpe_386181_40444391_cm_sw_cl_tw_doce%2F280-1861239-2544346%3F_encoding%3DUTF8%26location%3Dhttp%253A%252F%252Ftwitter.com%252Fshare%253Fcount%253Dnone%2526original_referer%253Dhttp%25253A%25252F%25252Fwww.amazon.fr%25252Fdp%25252FB003629R5S%25252Fref%25253Dcm_sw_r_tw_doce%2526related%253Damazon%25252Camazondeals%25252Camazonmp3%2526text%253DBrabantia%252520Lot%252520de%25252012%252520rouleaux%252520de%25252010%252520sacs%252520poubelle%252520Type%252520L%25252045%252520l%252520sur%252520Amazon%2526twitterURL%253Dhttp%25253A%25252F%25252Fwww.amazon.fr%25252Fdp%25252FB003629R5S%25252Fref%25253Dcm_sw_r_tw_doce%2526via%253Damazon%26source%3Dstandards%26token%3D7A1A4AE8F6CE0BD277D8295E58702D283F329C0F&A=KPDO6A0PIPKRQL84ARGCMAOOCASA&H=TA6BYC0F3HFJPCCQIIOCPYIGFAGA&ref_=pe_386181_40444391_cm_sw_cl_tw_doce" title="Twitter" style="text-decoration: none; color: rgb(0, 102, 153); font: 12px/ 16px Arial, sans-serif"> <img src="http://g-ecx.images-amazon.com/images/G/08/x-locale/communities/social/twitter._V388040480_.gif" width="16" alt="Twitter" style="vertical-align: middle; border: 0" height="16" border="0"> </a> </td>
<td style="vertical-align: top; font-size: 12px; line-height: 18px; font-family: Arial, sans-serif"> <a href="https://www.amazon.fr/gp/r.html?C=11II60L0IUDTQ&K=A37E83YVOBN2AM&R=JC53DV4YW1VB&T=C&U=http%3A%2F%2Fwww.amazon.fr%3A80%2Fgp%2Fredirect.html%2Fref%3Dpe_386181_40444391_cm_sw_cl_pi_doce%2F280-1861239-2544346%3F_encoding%3DUTF8%26location%3Dhttp%253A%252F%252Fpinterest.com%252Fpin%252Fcreate%252Fbutton%252F%253Fdescription%253DBrabantia%252520Lot%252520de%25252012%252520rouleaux%252520de%25252010%252520sacs%252520poubelle%252520Type%252520L%25252045%252520l%252520sur%252520Amazon%25252C%252520http%25253A%25252F%25252Fwww.amazon.fr%25252Fdp%25252FB003629R5S%25252Fref%25253Dcm_sw_r_pi_doce%2526is_video%253Dfalse%2526media%253Dhttp%25253A%25252F%25252Fecx.images-amazon.com%25252Fimages%25252FI%25252F31FSVzCchgL._SCLZZZZZZZ__SY115_SX115_.jpg%2526title%253D%2526url%253Dhttp%25253A%25252F%25252Fwww.amazon.fr%25252Fdp%25252FB003629R5S%25252Fref%25253Dcm_sw_r_pi_doce%26source%3Dstandards%26token%3D9F58B366258E1A8B5259E9BEF3482E02341F42D3&A=RDONF9RAZWJSW6DTDZM6CAUCAXAA&H=GEAUNFZ4QS9J5KE00AWBWWLX81UA&ref_=pe_386181_40444391_cm_sw_cl_pi_doce" title="Pinterest" style="text-decoration: none; color: rgb(0, 102, 153); font: 12px/ 16px Arial, sans-serif"> <img src="http://g-ecx.images-amazon.com/images/G/08/x-locale/communities/social/pinterest._V389372180_.png" width="16" alt="Pinterest" style="vertical-align: middle; border: 0" height="16" border="0"> </a> </td>
</tr>
</tbody>
</table>
</div> </td>
<td class="price" style="width: 80px; text-align: right; font-size: 14px; padding: 10px 10px 0 0; vertical-align: top; line-height: 18px; font-family: Arial, sans-serif"> <strong>EUR 59,99</strong> <br> </td>
</tr>
</tbody>
</table>
答案 0 :(得分:1)
您示例中的表格数据单元格除了某些空格外没有任何文本内容。它们具有带属性的子元素,但我想你想提取它们的数据。
使用DOM + Xpath。 DOM可以加载HTML(它将修复错误并可能改变结构)。 DOMXpath::evaluate()
允许您从DOM中获取节点列表和标量值。 Xpath表达式用于处理DOM内的节点。
$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);
$result = [];
foreach($xpath->evaluate('//table[@id="itemDetails"]//table/tbody/tr') as $tr) {
$row = [];
foreach ($xpath->evaluate('td[a]', $tr) as $td) {
$row[] = [
'href' => $xpath->evaluate('string(a/@href)', $td),
'image' => $xpath->evaluate('string(a/img/@src)', $td),
'text' => $xpath->evaluate('string(a/img/@alt)', $td)
];
}
$result[] = $row;
}
var_dump($result);
输出:
array(1) {
[0]=>
array(3) {
[0]=>
array(3) {
["href"]=>
string(908) "https://www...."
["image"]=>
string(103) "http://g-ecx..."
["text"]=>
string(8) "Facebook"
}
[1]=>...