Python:关于Web抓取的AttributeError和挑战

时间:2017-08-16 01:48:03

标签: python html web screen-scraping

我正试图从“http://www.landwatch.com/Philippines_land_for_sale/Land”抓取数据;我需要的是地址和价格信息。我的方法是在python中使用漂亮的汤模块。当我检查html页面时,我也遇到了问题。愿你们中的一些人给我一点提示,以便继续前进。基本上网络检查表明我需要的信息来自div class = clear property left,这里是代码:

from lxml import html
import requests
import bs4 as bs
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'http://www.landwatch.com/Philippines_land_for_sale/Land'

#Openning up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
soup = bs.BeautifulSoup(page_html,'lxml')
g_data = soup.find_all("div",{"class": "clear property left"})
for item in g_data:
  print(item).contents[0]

感谢,

1 个答案:

答案 0 :(得分:2)

您就在那里,地址和价格信息位于<a><div class="propName">的{​​{1}}元素中,您可以在<div class="clear property left">内更深入地找到,如下所示:

g_data

输出将是:

import requests
from bs4 import BeautifulSoup
my_url = 'http://www.landwatch.com/Philippines_land_for_sale/Land'
link=requests.get(my_url)
soup = BeautifulSoup(link.content, 'lxml')
g_data =soup.find_all('div',class_='clear property left')
for item in g_data:
    address_price_info = item.find("div",{"class":"propName"}).find('a').text
    print(address_price_info )

<强>更新

如果您使用chrome检查地址和价格信息,它会显示位置:

   Cebu City, Philippines  1185000, PHP
   Tagaytay, Philippines  $116,000
   Quezon City, Philippines  $2,837,000
   Sta Rosa Laguna, Philippines  15500, PHP
   Makati, Philippines  $5,947,826
   Puerto Princesa City, Philippines  $358,813
   Carcar, Philippines  35000000, PHP
   Lipa City, Philippines  $57,750
   Makati, Philippines  6400000, PHP
   Taytay, Philippines  $2,300,000
   Taguig, Philippines  $504,208
   Taguig City, Philippines  $13,760
   Quezon City, Philippines  58000000, PHP
   Cebu City, Philippines  7799030, PHP
   Las Pinas, Philippines  $468,000