从不同的URL中提取数据,并使用lxml将答案附加到列表中

时间:2017-10-26 09:54:11

标签: python html python-requests

我编写了一个脚本,用于将作业列表保存为.html文件。

代码是:

from lxml import html
import requests

page = requests.get('https://www.fasthosts.co.uk/careers/current-vacancies')

content = html.fromstring(page.content)

Vacancies = content.xpath('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')

f = open('scrapevacancy.html', 'w')
f.write('<br>'.join(map(str, Vacancies)))
f.close

但是,我还需要脚本来访问给定作业的每个URL,检查是否有“立即应用”按钮并将结果附加到scrapevacancy.html中的每个结果

这有可能吗?

1 个答案:

答案 0 :(得分:0)

from lxml import html
import requests

page = requests.get('https://www.fasthosts.co.uk/careers/current-vacancies')
content = html.fromstring(page.content)
Vacancies = content.xpath('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')
f = open('scrapevacancy.html', 'w')
li = [a.attrib['href'] for a in content.xpath('//a[@class="button button__primary featuredvacancy__button"]')]
i = 0
for l in li:
    p = requests.get('https://www.fasthosts.co.uk/'+l)
    c = html.fromstring(p.content)
    apply = c.xpath('//a[@class="button button__primary button--dtfull"]')
    if apply:
        f.write(str(Vacancies[i]) + ' Yes <br/>')
    else:
        f.write(str(Vacancies[i]) + ' No <br/')
    i=i+1
f.close

输出文件

  

开发人员(Java / Python)是
铅质量保证工程师是   
高级财务会计师 - FTC长达12个月是
人力资源   官员是人力资源&amp;培训管理员是
前端Web   开发人员是
数据中心操作员是