最近我尝试使用lxml
和requests
从网页解析html表。
python代码运行如下:
>>> from lxml to html
>>> import requests
>>> page = requests.get('http://www.bigpaisa.com/candlestick-stock-screener-result/nse/bearish-evening-star-candlestick-pattern')
>>> tree = html.fromstring(page.text)'
然后我想使用lxml.xpath()
函数解析以下重复数据块以获取列表:
<TR>
<TD style="font-size: 11px;"><!-- <a href="/company-technical-details/<%=sr.getExchange()%>/<%=sr.getSymbol()%>/<%=sr.getName()%>" ><%= sr.getSymbol() %></a> -->
AMTEKINDIA </TD>
<TD style="font-size: 11px; max-width: 135px;">AMTEK INDIA LIMITED</TD>
<TD> nse </TD>
<TD style="min-width: 60px; max-width: 60px;">02-01-2015</TD>
<TD>78</TD>
<TD>78.3</TD>
<TD>72.25</TD>
<TD>73.9</TD>
但无法做到这一点,例如:
>>> symbol=tree.xpath('//TD[@style="font-size: 11px;"][@!-- [@a href="/company-t
echnical-details/[@%=sr.getExchange()%]/[@%=sr.getSymbol()%]/[@%=sr.getName()%]"
][@%= sr.getSymbol() %][@/a] --]/text()')
给出Xpath评估错误和
>>> prices=tree.xpath('//TD/text()')
返回没有值的列表。
答案 0 :(得分:1)
您感兴趣的行位于标识为<table>
的{{1}}内。
sortable
请注意,您根本不需要from lxml import html
url = 'http://www.bigpaisa.com/candlestick-stock-screener-result/nse/bearish-%20evening-star-candlestick-pattern'
doc = html.parse(url)
# you can use XPath to select elements...
rows = doc.xpath("//table[@id = 'sortable']/tbody/tr")
# or, if you prefer, use CSS selectors instead...
rows = doc.cssselect("table#sortable tbody tr")
for tr in rows:
# do something with each tr, for example
tds = tr.cssselect("td")
print tds[4].text
模块。