我想仅为下面的html
抓取表格的代码和名称it isn't used now.
欲望输出是
<div id="ctl00_cph1_divSymbols" class="cb"><table class="quotes">
<TR><TH>Code</TH><TH>Name</TH><TH style="text-align:right;">High</TH><TH style="text-align:right;">Low</TH><TH style="text-align:right;">Close</TH><TH style="text-align:right;">Volume</TH><TH style="text-align:center;" colspan=3>Change</TH><th width=40> </th></tr>
<tr class="ro" onclick="location.href='/stockquote/SGX/Z25.htm';" style="color:green;"><td><A href="/stockquote/SGX/Z25.htm" title="Display Quote & Chart for SGX,Z25">Z25</A></td><td>Yanlord Land Group Limited</td><td align=right>1.400</td><td align=right>1.380</td><td align=right>1.385</td><td align=right>1,244,200</td><td align="right">0.005</td><td align="center"><IMG src="/images/up.gif"></td><td align="left">0.36</td><td align="right"><a href="/stockquote/SGX/Z25.htm" title="Download Data for SGX,Z25"><img src="/images/dl.gif" width=14 height=14></a> <a href="/stockquote/SGX/Z25.htm" title="View Quote and Chart for SGX,Z25"><img src="/images/chart.gif" width=14 height=14></a></td></tr>
<tr class="re" onclick="location.href='/stockquote/SGX/Z59.htm';" style="color:green;"><td><A href="/stockquote/SGX/Z59.htm" title="Display Quote & Chart for SGX,Z59">Z59</A></td><td>Yoma Strategic Holdings Ltd</td><td align=right>0.5850</td><td align=right>0.5750</td><td align=right>0.5850</td><td align=right>2,312,600</td><td align="right">0.0100</td><td align="center"><IMG src="/images/up.gif"></td><td align="left">1.74</td><td align="right"><a href="/stockquote/SGX/Z59.htm" title="Download Data for SGX,Z59"><img src="/images/dl.gif" width=14 height=14></a> <a href="/stockquote/SGX/Z59.htm" title="View Quote and Chart for SGX,Z59"><img src="/images/chart.gif" width=14 height=14></a></td></tr>
<tr class="ro" onclick="location.href='/stockquote/SGX/Z74.htm';" style="color:green;"><td><A href="/stockquote/SGX/Z74.htm" title="Display Quote & Chart for SGX,Z74">Z74</A></td><td>Singtel</td><td align=right>3.930</td><td align=right>3.860</td><td align=right>3.910</td><td align=right>21,674,300</td><td align="right">0.040</td><td align="center"><IMG src="/images/up.gif"></td><td align="left">1.03</td><td align="right"><a href="/stockquote/SGX/Z74.htm" title="Download Data for SGX,Z74"><img src="/images/dl.gif" width=14 height=14></a> <a href="/stockquote/SGX/Z74.htm" title="View Quote and Chart for SGX,Z74"><img src="/images/chart.gif" width=14 height=14></a></td></tr>
<tr class="re" onclick="location.href='/stockquote/SGX/Z77.htm';" style="color:green;"><td><A href="/stockquote/SGX/Z77.htm" title="Display Quote & Chart for SGX,Z77">Z77</A></td><td>Singtel 10</td><td align=right>3.920</td><td align=right>3.860</td><td align=right>3.900</td><td align=right>69,460</td><td align="right">0.050</td><td align="center"><IMG src="/images/up.gif"></td><td align="left">1.30</td><td align="right"><a href="/stockquote/SGX/Z77.htm" title="Download Data for SGX,Z77"><img src="/images/dl.gif" width=14 height=14></a> <a href="/stockquote/SGX/Z77.htm" title="View Quote and Chart for SGX,Z77"><img src="/images/chart.gif" width=14 height=14></a></td></tr>
</table>
</div>
我的python代码如下:
Z25,Yanlord Land Group Limited
Z59,Yoma Strategic Holdings Ltd
Z74,Singtel
Z77,Singtel 10
tree1正确地给我代码但是tree2名称与许多不需要的数据混合。如何为欲望输出提供强大的代码?
答案 0 :(得分:0)
您可以使用td[2]
获取第二个td标记:
from lxml import html
import requests
page = requests.get('http://eoddata.com/stocklist/SGX/Z.htm')
tree = html.fromstring(page.content)
tree1 = tree.xpath('//td/a[contains(@href,"/stockquote/SGX")]/text()')
# tree2 = tree.xpath('//tr[@class]/td/following-sibling::td/text()')
tree2 = tree.xpath('//tr[@class and @onclick]/td[2]/text()')
print tree1, tree2
请注意,为了避开右下方的表,[@class and @onclik]
用于定位我们需要的表。
结果:
['Z25', 'Z59', 'Z74', 'Z77'] ['Yanlord Land Group Limited', 'Yoma Strategic Holdings Ltd', 'Singtel', 'Singtel 10']