构建一个熊猫数据框

时间:2021-07-14 22:50:56

标签: python pandas dataframe

请完成以下任务 Scrape 10 pages (last page(10 th) url will be https://www.indeed.com/jobs?q=data+scientist&l=CO&start=90)并构建一个包含以下信息的pandas DataFrame 职位名称、公司名称、地点、职位描述摘要 关于关键字 Python、SQL、AWS、RESTFUL、机器学习、深度学习、文本挖掘、NLP、SAS、Tableau、Sagemaker、TensorFlow、Spark 的指标列(值为 True/False)

这是我的代码

import pandas as pd
import numpy as np
import requests
import csv
from urllib.request import Request,urlopen 
from bs4 import BeautifulSoup as bsoup
url = 'https://www.indeed.com/jobs?q=data+scientist&l=CO'
    
jobTitle = []
companyName = []
companyLocation = []
summary = []


for i in range(10,100,10):
    t = 0
    if url[-2:] == 'CO':
        url = url
        t += 1
        
    if len(url) == 49 and t == 1:
        url += '&start='+str(i)
  
    else:
        url = url[0:49]
        url += '&start='+str(i)
   
        
    urls = str(url)
    page = requests.get(urls)
    soup = bsoup(page.text,'html.parser')
    card = soup.find_all('div','slider_container')
    
    for i in range(len(card)):
        for name in card[i].find_all('span',title=True):
            jobTitle.append(name['title'])
        
        cn = card[i].find('span',"companyName").text
        companyName.append(cn)
        
        cl = card[i].find('div',"companyLocation").text
        companyLocation.append(cl)
        
        s = card[i].find('div',"job-snippet").text
        summary.append(s)
        
        
DATA = {'Job':jobTitle,'Company':companyName,'Location':companyLocation,'Summary':summary}
df = pd.DataFrame(DATA)
df

我已经从中提取了职位名称、公司名称、地点、职位描述摘要,但我不知道如何做指标栏

0 个答案:

没有答案
相关问题