Question

我在网站上有一些<div>和其他内容，而且在无数div中间有特定的行

<input name="extWarrantyProds" type="hidden" value="23814298 ^ true"/>

如何从此代码获取“值”部分，它位于包含其他内容的网站中间？

我正在尝试使用urllib，但我甚至不知道从哪里开始= /

Answer 1

import lxml.html as lh

html = '''
<input name="extWarrantyProds" type="hidden" value="23814298 ^ true"/>
'''

# If you want to parse from a URL:
# tree = lh.parse('http://example.com')

tree = lh.fromstring(html)

print tree.xpath("//input[@name='extWarrantyProds']/@value")

Answer 2

我能想到的最简单的方法：

import urllib

urlStr = "http://www..."

fileObj = urllib.urlopen(urlStr)

for line in fileObj:
    if ('<input name="extWarrantyProds"' in line):
        startIndex = line.find('value="') + 7
        endIndex = line.find('"',startIndex)
        print line[startIndex:endIndex]

Answer 3

如果你需要的话，就不需要任何太过花哨的东西了。使用urllib下载页面，然后使用re.findall()查找值。

import re
import urllib

url = 'http://...'
html = urllib.urlopen(url).read()
matches = re.findall('<input name="extWarrantyProds.*?>', x, re.DOTALL)
for i in matches:
  print re.findall('value="(.*?)"', i)

如何在网页中获取特定值？

3 个答案: