以字典格式获取已爬网信息

时间:2016-02-06 14:18:29

标签: python dictionary web-crawler

我像往常一样获取信息,但是我想要以键/值格式输出。 例如:

{'Base pay':'$140,000.00 - $160,000.00 /Year'},
{'Employment Type':'Full-Time'},
{'Job Type':'Information Technology,  Engineering,  Professional Services'}

这是我的代码:

from bs4 import BeautifulSoup 
import urllib
website = 'http://www.careerbuilder.com/jobseeker/jobs/jobdetails.aspx?APath=2.21.0.0.0&job_did=J3H7FW656RR51CLG5HC&showNewJDP=yes&IPath=RSKV' 
html = urllib2.urlopen(website).read()
soup = BeautifulSoup(html)
for elm in soup.find_all('section',{"id":"job-snapshot-section"}):
    dn = elm.get_text()
print dn

这是我的代码输出:

Job Snapshot


Base Pay
$140,000.00 - $160,000.00 /Year


Employment Type
Full-Time


Job Type
Information Technology,  Engineering,  Professional Services


Education
4 Year Degree


Experience
At least 5 year(s)


Manages Others
Not Specified


Relocation
No


Industry
Computer Software, Banking - Financial Services, Biotechnology


Required Travel
Not Specified


Job ID
EE-1213256

我已根据要求编辑了代码,包括必需的库导入

1 个答案:

答案 0 :(得分:1)

我建议:

dict(i.strip().split('\n') for i in text.split('\n\n') if len(i.strip().split('\n')) == 2)

输出:

{'Job ID': 'EE-1213256', 
 'Manages Others': 'Not Specified', 
 'Job Type': 'Information Technology,  Engineering,  Professional Services', 
 'Relocation': 'No', 
 'Education': '4 Year Degree', 
 'Base Pay': '$140,000.00 - $160,000.00 /Year', 
 'Experience': 'At least 5 year(s)', 
 'Industry': 'Computer Software, Banking - Financial Services, Biotechnology', 
 'Employment Type': 'Full-Time', 
 'Required Travel': 'Not Specified'}