我使用DataMiner Chrome扩展程序从网站上抓取数据。在Chrome中,我使用了Inspect Element>右键单击Inspector中的元素>复制XPath方法以生成我需要的数据的XPath。这很有效,但是当我进行刮擦时,我会反复重复第一次结果输入。
在检查第二个结果并复制XPath时,我注意到ID中有一个序列号。以下是第一个数据点的XPath:
Entry 1 Company: //*[@id="Repeater_Results_ctl01_tCell1"]/h3/b/a
Entry 2 Company: //*[@id="Repeater_Results_ctl02_tCell1"]/h3/b/a
我可以为序号插入变量吗?还是有更好的方法来解决它?
这是一个示例页面:
<table border="0" cellpadding="0" cellspacing="0" class="tablelist" id="TsTable">
<tr>
<th class="col1">Organization</th>
<th>
<div class="tCol2">Location</div>
</th>
</tr>
<tr>
<td id="Repeater_Results_ctl01_tCell1" class="tCell1">
<h3><b><a href="/organization-search/details.aspx?slne=8118" target="_blank">Organization A</a></b></h3>
<span class="nm">John Doe</span></h3>
<p style="margin-bottom:0"><b>Phone:</b> 555-123-4567<br /><span class="webp"><b>Web: </b><a href="http://www.companya.com" target="_blank">www.companya.com</a></span><br />
</p>
<div class="locMobile"><b>LOCATION</b><br />
<span style="white-space:nowrap">Anywhere, USA</span> <br />
12345<br />
<small><strong>0 miles</strong></small>
</div>
</td>
<td id="Repeater_Results_ctl01_tCell2" class="tCell2a">
<div class="tCell2"><b>LOCATION:</b><br />
<span class="nw">Anywhere, USA</span><br />
12345<br />
<small><strong>0 miles</strong></small>
</div>
</td>
</tr>
<tr>
<td id="Repeater_Results_ctl01_tCell4" colspan="2" class="tCell4a">
<p><b>Services:</b> XYZ Services</p>
<p><strong>Locations:</strong> Anywhere, USA</p>
</td>
</tr>
<tr>
<td id="Repeater_Results_ctl02_tCell1" class="tCell1">
<h3><b><a href="/organization-search/details.aspx?slne=2982" target="_blank">Organization B</a></b></h3>
<span class="nm">Jane Dough</span></h3>
<p style="margin-bottom:0"><b>Phone:</b> 555-123-9876<br /><span class="webp"><b>Web: </b><a href="http://www.organizationb.com" target="_blank">www.organizaionb.com</a></span><br />
</p>
<div class="locMobile"><b>LOCATION</b><br />
<span style="white-space:nowrap">Somewhere, USA</span> <br />
12345<br />
<small><strong>6.7 miles</strong></small>
</div>
</td>
<td id="Repeater_Results_ctl02_tCell2" class="tCell2a">
<div class="tCell2"><b>LOCATION:</b><br />
<span class="nw">Somewhere, USA</span><br />
12345<br />
<small><strong>6.7 miles</strong></small>
</div>
</td>
</tr>
<tr>
<td id="Repeater_Results_ctl02_tCell4" colspan="2" class="tCell4a">
<p><b>Services:</b> ABC Services</p>
<p><strong>Locations:</strong>Somewhere, USA</p>
</td>
</tr>
感谢您的帮助!
答案 0 :(得分:1)
您可以使用包含方法,就像这样
//*[contains(@id, "_tCell1")]/h3/b/a