我正在处理一个项目,我试图让lxml从单独的网页上的单独表中提取库存数据。当我运行我的程序试图打印值时,我试图拉动我得到空括号
('Cash_and_short_term_investments:', [])
('EPSNextYear:', [])
以下是我称之为的方式:
#the url at this point is http://finviz.com/quote.ashx?t=RAIL confirmed with print statement
url = driver.current_url
page2 = requests.get(url)
tree2 = html.fromstring(page2.content)
EPSNextYear =
tree2.xpath('/html/body/table[3]/tr[1]/td/table/tr[7]/td/table/tr[2]/td[6]/b')
#Original XPath:/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[2]/td[6]/b
print ('EPSNextYear:', EPSNextYear)
和:
#the url at this point is https://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA I've confirmed this with a print
url = driver.current_url
page3 = requests.get(url)
tree3 = html.fromstring(page3.content)
Cash_and_Short_Term_Investments = tree3.xpath('//*[@id="fs-table"]/tr[3]/td[2]/text()')
print('Cash_and_short_term_investments:', Cash_and_Short_Term_Investments)
我已经从XPath中移除了tbody,就像一些类似的问题所表明的那样。任何帮助或建议将不胜感激,谢谢!
答案 0 :(得分:0)
在提出这样的问题时,您需要提供一个简短而完整的示例来说明问题。
看看你的第二个例子,很明显你使用的XPath表达式是不正确的。您缺少XPath中的tbody
元素。 (并且您可能希望通过查找要搜索的实际字符串来选择正确的表格行。)
给出以下代码:
from lxml import etree
import urllib
url="http://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA"
parser = etree.HTMLParser()
tree = etree.parse(urllib.urlopen(url), parser)
result = tree.xpath('//*[@id="fs-table"]/tbody/tr[normalize-space(td) = "Cash and Short Term Investments"]')
for x in result: print etree.tostring(x)
当像这样运行时:
> python test.py
您将获得以下输出:
<tr>
<td class="lft lm">Cash and Short Term Investments
</td>
<td class="r">39.78</td>
<td class="r">78.45</td>
<td class="r">91.21</td>
<td class="r">110.02</td>
<td class="r rm">125.01</td>
</tr>
<tr>
<td class="lft lm">Cash and Short Term Investments
</td>
<td class="r">110.02</td>
<td class="r">161.49</td>
<td class="r">184.49</td>
<td class="r rm">140.49</td>
</tr>
我确定你能够弄清楚你的第一个例子有什么问题,一旦你把它变成一个独立的问题再现者。