我想从html中只提取几个项目。
<table cellspacing="0" cellpadding="2" width="100%" border="0" class="TableBorderBottom">
<tr>
<td class="tblBursaSummHeader">No.</td>
<td class="tblBursaSummHeader">Name</td>
<td class="tblBursaSummHeader">Stock<br>Code</td>
<td class="tblBursaSummHeader">Rem</td>
<td class="tblBursaSummHeader">Last<br>Done</td>
<td class="tblBursaSummHeader" width="55">Chg</td>
<td class="tblBursaSummHeader">% Chg</td>
<td class="tblBursaSummHeader">Vol<br>('00)</td>
<td class="tblBursaSummHeader">Buy Vol<br>('00)</td>
<td class="tblBursaSummHeader">Buy</td>
<td class="tblBursaSummHeader">Sell</td>
<td class="tblBursaSummHeader">Sell Vol<br>('00)</td>
<td class="tblBursaSummHeader">High</td>
<td class="tblBursaSummHeaderRect">Low</td>
</tr>
<tr>
<td class="tblBursaSEvenRow">1</td>
<td class="tblBursaSEvenRow"><a href="/tools.pl?action=factsheet&id=8494WA">LBI CAPITAL BHD-WARRANT A 08/8</a> (LBICAP-WA)</td>
<td class="tblBursaSEvenRow Right">8494WA</td>
<td class="tblBursaSEvenRow Right">s</td>
<td class="tblBursaSEvenRow Right">0.160</td>
<td class="tblBursaSEvenRow Right"><img src="/images/upArrow.gif" border=0> <span class=tblUp>+0.120</span></td>
<td class="tblBursaSEvenRow Right">300.0</td>
<td class="tblBursaSEvenRow Right">341,238</td>
<td class="tblBursaSEvenRow Right">745</td>
<td class="tblBursaSEvenRow Right">0.160</td>
<td class="tblBursaSEvenRow Right">0.160</td>
<td class="tblBursaSEvenRow Right">1,049</td>
<td class="tblBursaSEvenRow Right">0.185</td>
<td class="tblBursaSEvenRowRight Right">0.040</td>
</tr>
<tr>
<td class="tblBursaSOddRow">2</td>
<td class="tblBursaSOddRow"><a href="/tools.pl?action=factsheet&id=7091WA">UNIMECH GROUP BHD-WA13/18</a> (UNIMECH-WA)</td>
<td class="tblBursaSOddRow Right">7091WA</td>
<td class="tblBursaSOddRow Right">s</td>
<td class="tblBursaSOddRow Right">0.070</td>
<td class="tblBursaSOddRow Right"><img src="/images/upArrow.gif" border=0> <span class=tblUp>+0.040</span></td>
<td class="tblBursaSOddRow Right">133.3</td>
<td class="tblBursaSOddRow Right">261,521</td>
<td class="tblBursaSOddRow Right">8,468</td>
<td class="tblBursaSOddRow Right">0.065</td>
<td class="tblBursaSOddRow Right">0.070</td>
<td class="tblBursaSOddRow Right">5,008</td>
<td class="tblBursaSOddRow Right">0.080</td>
<td class="tblBursaSOddRowRight Right">0.040</td>
</tr>
<tr>
我想要的输出来自Stock,Last done和Change。所以理想的输出是
8494WA
0.160
+0.120
7091WA
0.070
+0.040
我能够提取数据,但我需要三行代码,但我更喜欢一行代码,可以做同样的工作。
page_gain = requests.get('url')
gain = html.fromstring(page_gain.content)
stock = gain.xpath('//table[@class="TableBorderBottom"]/tr/td[3]/text()')
>>> ['Stock', 'Code', '8494WA', '7091WA']
gain.xpath('//table[@class="TableBorderBottom"]/tr/td[5]/text()')
>>>['Last', 'Done', '0.145', '0.075']
gain.xpath('//td/span/text()')
>>>['+0.120', '+0.070']
请注意,我还希望在结果中消除字符串'Stock','Code','Last','Done'
答案 0 :(得分:0)
您需要处理循环中的每一行并从中获取所需信息:
data = []
for data_row in gain.xpath('//table[@class="TableBorderBottom"]/tr[position() > 1]'):
stock = data_row.xpath('./td[3]/text()')[0]
last_done = data_row.xpath('./td[5]/text()')[0]
change = data_row.xpath('./td[6]/span/text()')[0]
data.append({ "Stock": stock, "Last Done": last_done, "Change": change })