您好,到目前为止,我已经从JobListing网站上搜刮了此信息。一切似乎都运行良好,但是我正努力将这些信息放入包含标头和所有内容的数据帧中。任何帮助表示赞赏。 我的完整代码是:
import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = 'https://www.monster.com/jobs/search/?q=Software-Developer&where=Australia'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='ResultsContainer')
python_jobs = results.find_all('h2',string=lambda text: 'test' in text.lower())
for p_job in python_jobs:
link = p_job.find('a')['href']
print(p_job.text.strip())
print(f"Apply Here: {link}")
job_elems = results.find_all('section', class_= 'card-content')
for job_elem in job_elems:
title_elem = job_elem.find('h2', class_='title')
company_elem = job_elem.find('div', class_='company')
location_elem = job_elem.find('div', class_='location')
if None in (title_elem, company_elem, location_elem):
continue
print(title_elem.text.strip())
print(company_elem.text.strip())
print(location_elem.text.strip())
print()
不确定如何处理。
答案 0 :(得分:0)
您可以将工作详细信息(即职务,公司和位置)保存在字典中,然后对字典进行数据框化。
import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = 'https://www.monster.com/jobs/search/?q=Software-Developer&where=Australia'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='ResultsContainer')
python_jobs = results.find_all('h2',string=lambda text: 'test' in text.lower())
for p_job in python_jobs:
link = p_job.find('a')['href']
print(p_job.text.strip())
print(f"Apply Here: {link}")
job_elems = results.find_all('section', class_= 'card-content')
i = 1
my_job_list = {}
for job_elem in job_elems:
title_elem = job_elem.find('h2', class_='title')
company_elem = job_elem.find('div', class_='company')
location_elem = job_elem.find('div', class_='location')
if None in (title_elem, company_elem, location_elem):
continue
op = f'opening {i}'
my_job_list[op] = {'position':title_elem.text.strip(), 'company':
company_elem.text.strip(), 'location': location_elem.text.strip()}
i= i+1
print(title_elem.text.strip())
print(company_elem.text.strip())
print(location_elem.text.strip())
df = pd.DataFrame(my_job_list)
print(df)
答案 1 :(得分:0)
对所有列使用concat()
,然后将append()
循环到一个数据帧
import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = 'https://www.monster.com/jobs/search/?q=Software-Developer&where=Australia'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='ResultsContainer')
python_jobs = results.find_all('h2',string=lambda text: 'test' in text.lower())
for p_job in python_jobs:
link = p_job.find('a')['href']
print(p_job.text.strip())
print(f"Apply Here: {link}")
job_elems = results.find_all('section', class_= 'card-content')
df= pd.DataFrame()
for job_elem in job_elems:
title_elem = job_elem.find('h2', class_='title')
company_elem = job_elem.find('div', class_='company')
location_elem = job_elem.find('div', class_='location')
if None in (title_elem, company_elem, location_elem):
continue
df1=pd.concat([pd.Series(title_elem.text.strip()),
pd.Series(company_elem.text.strip()),
pd.Series(location_elem.text.strip())],axis=1)
df=df.append(df1)
print(df)