使用Python解析JavaScript href

时间:2012-05-29 17:36:34

标签: python parsing beautifulsoup

对此有很多麻烦...... Python的新手很抱歉如果我不知道正确的搜索条件来自己查找信息。我甚至不肯认为是因为JS,但这是我最好的想法。

以下是我正在解析的HTML部分:

...
<div class="promotion">
    <div class="address">
        <a href="javascript:PropDetail2('57795471:MRMLS')" title="View property detail for 5203 Alhama Drive">5203 Alhama Drive</a>
    </div>
</div>
...

...和我正在使用的Python(这个版本是我最接近成功的版本):

homeFinderSoup = BeautifulSoup(open("homeFinderHTML.html"), "html5lib")
addressClass = homeFinderSoup.find_all('div', 'address')
for row in addressClass:
    print row.get('href')

...返回

None
None
None

1 个答案:

答案 0 :(得分:0)

# Create soup from the html. (Here I am assuming that you have already read the file into
# the variable "html" as a string).
soup = BeautifulSoup(html) 
# Find all divs with class="address"
address_class = soup.find_all('div', {"class": "address"})
# Loop over the results
for row in address_class:
  # Each result has one <a> tag, and we need to get the href property from it.
  print row.find('a').get('href')