Question

当我使用lmxl解析web时，lxml-xpath可以获得目标的一部分，请参阅我的代码：

import urllib
import lxml.html
url="http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm" 
file=urllib.urlopen(url).read() 
root=lxml.html.document_fromstring(file)
for company in root.xpath('//tr[@class="tr_normal"]'):
    print  company.text_content().encode('utf-8')  

>>>00325创生控股1,000#     
00326中国星集团50,000#     
00327百富环球1,000  
00328ALCO HOLDINGS2,000#     
00329  
>>>

有两个问题：
1.为什么我得到的只能000329？另一个matrial丢了？ 2.为什么我无法获得代码大于000329的公司信息？

enter image description here

Answer 1

read()不会立即阅读完整页面。你需要迭代它

来自文档：

read（）方法，如果省略size参数或为负数，则在数据流结束之前可能无法读取;在一般情况下，没有好的方法可以确定读取了套接字的整个流。

lxml可以解析python中tr的一部分

1 个答案: