我正试图从“http://www.landwatch.com/Philippines_land_for_sale/Land”抓取数据;我需要的是地址和价格信息。我的方法是在python中使用漂亮的汤模块。当我检查html页面时,我也遇到了问题。愿你们中的一些人给我一点提示,以便继续前进。基本上网络检查表明我需要的信息来自div class = clear property left,这里是代码:
from lxml import html
import requests
import bs4 as bs
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.landwatch.com/Philippines_land_for_sale/Land'
#Openning up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
soup = bs.BeautifulSoup(page_html,'lxml')
g_data = soup.find_all("div",{"class": "clear property left"})
for item in g_data:
print(item).contents[0]
感谢,
答案 0 :(得分:2)
您就在那里,地址和价格信息位于<a>
下<div class="propName">
的{{1}}元素中,您可以在<div class="clear property left">
内更深入地找到,如下所示:
g_data
输出将是:
import requests
from bs4 import BeautifulSoup
my_url = 'http://www.landwatch.com/Philippines_land_for_sale/Land'
link=requests.get(my_url)
soup = BeautifulSoup(link.content, 'lxml')
g_data =soup.find_all('div',class_='clear property left')
for item in g_data:
address_price_info = item.find("div",{"class":"propName"}).find('a').text
print(address_price_info )
<强>更新强>
如果您使用chrome检查地址和价格信息,它会显示位置:
Cebu City, Philippines 1185000, PHP
Tagaytay, Philippines $116,000
Quezon City, Philippines $2,837,000
Sta Rosa Laguna, Philippines 15500, PHP
Makati, Philippines $5,947,826
Puerto Princesa City, Philippines $358,813
Carcar, Philippines 35000000, PHP
Lipa City, Philippines $57,750
Makati, Philippines 6400000, PHP
Taytay, Philippines $2,300,000
Taguig, Philippines $504,208
Taguig City, Philippines $13,760
Quezon City, Philippines 58000000, PHP
Cebu City, Philippines 7799030, PHP
Las Pinas, Philippines $468,000