在 Python 中抓取特定字段

时间:2021-07-12 15:23:20

标签: python python-3.x selenium web-scraping

我如何从这里提取公司及其描述?

从我昨天的 question 中,我弄清楚了如何提取名称,但是当我应用相同的逻辑来提取它们的描述时,结果适得其反。

request = requests.get("https://www.clstack.com", verify=False, headers=headers)
soup = bs4.BeautifulSoup(request.content, 'html.parser')
data = soup.find_all('td', {'class':'company'})

for i in data:
    print(i.find['tr'])

输出

company|description

desc 在 'td' 标签内,但是当我从代码中调用它时,我没有得到任何输出。

1 个答案:

答案 0 :(得分:0)

您会注意到 <td class="company"> 标签后面是另一个带有说明的 <td> 标签。因此,一旦您遍历 <td class="company"> 元素,只需使用 .find_next('td') 来获取带有描述的下一个标签:

import requests
import bs4

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} # This is chrome, you can set whatever browser you like
request = requests.get("https://www.cloudtango.org", verify=False, headers=headers)
soup = bs4.BeautifulSoup(request.content, 'html.parser')
data = soup.find_all('td', {'class':'company'})

for each in data:
    company =  each.find('img')['alt']   
    description = each.find_next('td').text
    print(f'{company}: {description}\n\n')

输出:

Redcentric: Redcentric is a leading UK IT managed services provider that offers a range of IT and Cloud services designed to support organisations in their journey from traditional infrastructure to the Cloud …


Modern Networks: Established in 1999, Modern Networks is a leading provider of IT support, network services, business broadband and telecoms to the UK’s commercial property sector. Additionally, we work with around …


BlackPoint IT Services: BlackPoint’s comprehensive range of Managed IT Services is designed to help you improve IT quality, efficiency and reliability -and save you up to 50% on IT cost. Providing IT solutions for more …


AffinityMSP: AffinityMSP was created with one goal in mind: to help Australian businesses achieve success through high-performance technology. Our consultants take the time to get to know your business and …


centrexIT: Founded in 2002, centrexIT is San Diego's leader in IT management. Our locally-based technology professionals provide outsourced IT service, support, security and leadership for small and medium-…


Carbon60: Carbon60 specializes in delivering secure managed cloud solutions for public and private sector organizations with business-critical workloads. Businesses are at different stages in their cloud …


...