在python中通过xpath提取url源的值

时间:2014-11-15 08:34:56

标签: python xpath lxml

这是我的网页内容:

<input type="hidden" name="frm-id" value="AAA" id="frm-id" /></form></div><div id="container-getfocus_AAA" style="display:none"><input type="text" id="getfocus_txt_AAA" name="getfocus_txt_AAA" /></div>               <script type="text/javascript">formtarget['AAA'] = '';</script></div></div></div></div>    <!--        </div>-->

我想从value中提取 AAA

from lxml import html
import requests
cont=request.get(url).content
tree=html.fromstring(cont)
print tree.xpath('//input[@name="frm-id"].text()')

Output is:
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 1509, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:50702)
  File "xpath.pxi", line 318, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:145954)
  File "xpath.pxi", line 238, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:144962)
  File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:144817)
lxml.etree.XPathEvalError: Invalid expression

1 个答案:

答案 0 :(得分:0)

您的XPath表达式无效。获取value标记的input属性的正确XPath如下:

//input[@name="frm-id"]/@value