我是Python的新手,我正试图从房地产列表网站(www.realtor.ca)获取信息。到目前为止,我已经设法使用以下代码在列表中收集MLS号码:
import urllib2, sys, re, mechanize, itertools, csv
# Set the url for the online search
url = 'http://www.realtor.ca/PropertyResults.aspx?Page=1&vs=Residential&ret=300&curPage=PropertySearch.aspx&sts=0-0&beds=0-0&baths=0-0&ci=Victoria&pro=3&mp=200000-300000-0&mrt=0-0-4&trt=2&of=1&ps=10&o=A'
content = urllib2.urlopen(url).read()
text = str(content)
# finds all instances of "MLS®: " to create a list of MLS numbers
# "[0-9]+" matches all numbers (the plus means one or more) In this case it's looking for a 6-digit MLS number
findMLS = re.findall("MLS®: [0-9]+", text)
findMLS = [x.strip('MLS®: ') for x in findMLS]
# "Page 1 of " precedes the number of pages in the search result (10 listings per page)
num_pages = re.findall("Page 1 of [0-9]+", text)
num_pages = [y.strip('Page 1 of ') for y in num_pages]
pages = int(num_pages[0])
for page in range(2,pages+1):
# Update the url with the different search page numbers
url_list = list(url)
url_list[48] = str(page)
url = "".join(url_list)
# Read the new url to get more MLS numbers
content = urllib2.urlopen(url).read()
text = str(content)
newMLS = re.findall("MLS®: [0-9]+", text)
newMLS = [x.strip('MLS®: ') for x in newMLS]
# Append new MLS numbers to the list findMLS
for number in newMLS:
findMLS.append(number)
使用我的MLS号码列表(findMLS),我想将每个号码输入到本网站顶部的MLS#搜索框中:http://www.realtor.ca/propertySearch.aspx
使用inspect元素我可以找到这个搜索框,但我不知道如何使用Python代码和Mechanize来访问它。
<input type="text" id="txtMlsNumber" value="" style="background-color:#ebebeb;border:solid 1px #C8CACA; " onkeypress="javascript:MLSNumberSearch(event)">
非常感谢任何帮助。
答案 0 :(得分:0)
我没有使用过Mechanise,但我有幸用Selenium进行导航。我知道这是一个额外的模块,你可能会或可能不想使用它,但它非常用户友好,因为Selenium 2问世,你可以按照自己喜欢的方式浏览该网站。
编辑: 用这样的东西真的很容易。
mls_search = driver.find_element_by_id('txtMlsNumber')
mls_search.send_keys('number that you scraped')
search = driver.find_element_by_id('lnkMlsSearch')
search.click()