我不熟悉html和网页刮痧和漂亮的汤。我正在尝试从各种确实的职位发布中检索职位,薪水,地点和公司名称。到目前为止,这是我的代码:
URL = "http://www.indeed.com/jobs?q=data+scientist+%2420%2C000&l=New+York&start=10"
import urllib2
import bs4
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen(URL).read())
resultcol = soup.find_all(id = 'resultsCol')
company = soup.findAll('span', attrs={"class":"company"})
jobs = (soup.find_all({'class': " row result"}))
虽然我有找工作和公司的命令,但我无法获得内容。我知道有一个内容命令,但到目前为止我的变量都没有这个属性。谢谢!
答案 0 :(得分:2)
针对python3更新的@furas示例:
import urllib.request
from bs4 import BeautifulSoup
URL = "https://www.indeed.com/jobs?q=data+scientist+%2420%2C000&l=New+York&start=10"
soup = BeautifulSoup(urllib.request.urlopen(URL).read(), 'html.parser')
results = soup.find_all('div', attrs={'data-tn-component': 'organicJob'})
for x in results:
company = x.find('span', attrs={"class":"company"})
if company:
print('company:', company.text.strip() )
job = x.find('a', attrs={'data-tn-element': "jobTitle"})
if job:
print('job:', job.text.strip())
salary = x.find('nobr')
if salary:
print('salary:', salary.text.strip())
print ('----------')
答案 1 :(得分:1)
首先我用一个作业查找div
所有元素,然后搜索此div
import urllib2
from bs4 import BeautifulSoup
URL = "http://www.indeed.com/jobs?q=data+scientist+%2420%2C000&l=New+York&start=10"
soup = BeautifulSoup(urllib2.urlopen(URL).read(), 'html.parser')
results = soup.find_all('div', attrs={'data-tn-component': 'organicJob'})
for x in results:
company = x.find('span', attrs={"itemprop":"name"})
print 'company:', company.text.strip()
job = x.find('a', attrs={'data-tn-element': "jobTitle"})
print 'job:', job.text.strip()
salary = x.find('nobr')
if salary:
print 'salary:', salary.text.strip()
print '----------'