我希望从this页面中提取每个指令ID:
import lxml.html as lh
url ='https://secure.ssa.gov/apps10/reference.nsf/instructiontypecode!openview&restricttocategory=POMT'
response = urllib2.urlopen(url)
content = response.read()
root = lh.fromstring(content)
all_instruction_ids = root.xpath(XPATH_ALL_INSTRUCTION_IDS)
我尝试过Chrome&amp ;;无数的XPath表达式。 Firebug的开发人员工具,Firebug和其他浏览器插件:
XPATH_ALL_INSTRUCTION_IDS = '//*[@id="content"]/div/div/div[2]/table/tbody/tr/td[1]/font/a/.'
#XPATH_ALL_INSTRUCTION_IDS = '//*[@id="content"]/div/div/div[2]/table/tbody/tr/td[1]/font/a/text()'
XPATH_ALL_INSTRUCTION_IDS = '//*[@id="content"]/div/div/div[2]/table/tbody/tr/td[1]/font/a[contains(normalize-space(), "")]'
XPATH_ALL_INSTRUCTION_IDS = '//*[@id="content"]/div/div/div[2]/table/tbody/tr/td[1]/font/a'
XPATH_ALL_INSTRUCTION_IDS = ".//*[@id='content']/div/div/div[2]/table/tbody/tr[2]/td[1]/font/a"
XPATH_ALL_INSTRUCTION_IDS = "//form/div[1]/div[5]/div/div/div[2]/table/tbody/tr/td[1]/font/a"
XPATH_ALL_INSTRUCTION_IDS = "id('content')/div/div/div[2]/table/tbody/tr/td[1]/font/a"
XPATH_ALL_INSTRUCTION_IDS = "/html/body/form/div[1]/div[5]/div/div/div[2]/table/tbody/tr/td[1]/font/a"
XPATH_ALL_INSTRUCTION_IDS = "//html//body/form/div[1]/div[5]/div/div/div[2]/table/tbody/tr/td[1]//a"
XPATH_ALL_INSTRUCTION_IDS = "//html//body/form/div[1]/div[5]/div/div/div[2]/table/tbody/tr/td[1]/*/a"
然而,当传递给xpath()
lxml.html.fromstring()
方法时,它们都不起作用
答案 0 :(得分:1)
答案 1 :(得分:1)
我会在reference.nsf/links
内找到包含href
的所有链接:
//table//a[contains(@href, 'reference.nsf/links')]/text()
适合我。