如何抓取相同的班级名称数据

时间:2019-12-10 05:11:05

标签: python web-scraping beautifulsoup

我正尝试抓取一些房地产网站,但我遇到的一个网站在一个div下具有相同的类名称,而该div还有另外两个具有相同类名称的div。我想抓取儿童班数据(我认为)。

我想抓取以下课程数据:

@RunWith

以下是我要抓取的全部代码:

<div class="m-srp-card__summary__info">New Property</div>

我尝试建立索引,但一无所获。

下面是我的代码:

<div class="m-srp-card__collapse js-collapse" aria-collapsed="collapsed" data-container="srp-card- 
   summary">
   <div class="m-srp-card__summary js-collapse__content" data-content="srp-card-summary">   
   <input type="hidden" id="propertyArea42679361" value="888 sqft">
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">carpet area</div>
        <div class="m-srp-card__summary__info">888&nbsp;sqft</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">status</div>
        <div class="m-srp-card__summary__info">Ready to Move</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">floor</div>
        <div class="m-srp-card__summary__info">9 out of 13 floors</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">transaction</div>
        <div class="m-srp-card__summary__info">New Property</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">furnishing</div>
        <div class="m-srp-card__summary__info">Unfurnished</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">facing</div>
        <div class="m-srp-card__summary__info">South -West</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">overlooking</div>
        <div class="m-srp-card__summary__info">Garden/Park, Main Road</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">society</div>
        <div class="m-srp-card__summary__info">
        <a id="project-link-42679361" class="m-srp-card__summary__link" 
        href="https://www.magicbricks.com/skylights-bopal-ahmedabad-pdpid-4d4235303936323633" 
        target="_blank">Skylights</a>
        </div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">car parking</div>
        <div class="m-srp-card__summary__info">1 Covered</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">bathroom</div>
        <div class="m-srp-card__summary__info">3</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">balcony</div>
        <div class="m-srp-card__summary__info">2</div>
      </div>
      <div class="m-srp-card__summary__item">
        <div class="m-srp-card__summary__title">ownership</div>
        <div class="m-srp-card__summary__info">Co-operative Society</div>
      </div>
    </div>
    <div class="m-srp-card__collapse__control js-collapse__control" data-toggle="list-collapse" 
     data-target="srp-card-summary" onclick="stopPage=true;">
  <div class="ico m-srp-card__ico">
  <svg role="icon">
   <use xlink:href="#icon-caret-down"></use>
  </svg>
</div>

谢谢!

1 个答案:

答案 0 :(得分:0)

import requests
from bs4 import BeautifulSoup
import csv
import re

r = requests.get('https://www.magicbricks.com/property-for-sale/residential-real-estate?proptype=Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment,Residential-House,Villa&Locality=Bopal&cityName=Ahmedabad')
soup = BeautifulSoup(r.text, 'html.parser')

category = []
size = []
price = []
floor = []
for item in soup.findAll('span', {'class': 'm-srp-card__title__bhk'}):
    category.append(item.get_text(strip=True))
for item in soup.findAll(text=re.compile('area$')):
    size.append(item.find_next('div').text)
for item in soup.findAll('span', {'class': 'm-srp-card__price'}):
    price.append(item.text)
for item in soup.findAll(text='floor'):
    floor.append(item.find_next('div').text)
data = []
for items in zip(category, size, price, floor):
    data.append(items)

with open('output.csv', 'w+', newline='', encoding='UTF-8-SIG') as file:
    writer = csv.writer(file)
    writer.writerow(['Category', 'Size', 'Price', 'Floor'])
    writer.writerows(data)
    print("Operation Completed")

在线查看输出:click here