如何使用Python从此网页获取输出数据?

时间:2014-10-13 23:58:33

标签: python web-scraping html-parsing web-crawler

我正在尝试使用此网站获取两个地址之间的地理距离:http://www.freemaptools.com/how-far-is-it-between.htm

我希望能够转到页面,输入两个地址,单击“显示”,然后提取“距离作为乌鸦飞行”和“按陆地运输距离”值并将其保存到字典中。< / p>

有没有办法从这个网页获取ouptut数据(距离),我不熟悉html所以我不确定输出在哪里。我输入了数据,下面是我的代码供参考。

网页源代码:我无法破译

<tr>
<td align="right">From 
    <input name="pointa" type="text" value="" size="22" onkeypress="autocompletea(this.value, event)" /></td>
<td><div align="center">to</div></td>
<td><input name="pointb" type="text" value="" size="22" onkeypress="autocompleteb(this.value, event)"/></td>
<td><p role="button" tabindex="0" class="fmtbutton" onkeypress="findaandb(document.forms['inp']['pointa'].value,document.forms['inp']['pointb'].value);" onclick="findaandb(document.forms['inp']['pointa'].value,document.forms['inp']['pointb'].value);">&nbsp;Show&nbsp;</p>
  <label></label></td>
</tr>

我的代码:

import re
from mechanize import Browser

text = """ web input"""

browser = Browser()
browser.open("http://www.freemaptools.com/how-far-is-it-between.htm")

browser.select_form(nr=0)
browser['pointa'] = 'San Diego, Usa'
browser['pointb'] = 'San Francisco, Usa'

response = browser.submit()

content = response.read()

result = re.findall(r'dist', content)
print result[5]

感谢您的帮助

1 个答案:

答案 0 :(得分:0)

这个页面大量使用javascript,而机械师不会像浏览器那样处理它。

如果您检查源,您可以使用几个apis和计算来查看它,例如使用主api和一个乌鸦飞行计算,你可以通过这种方式获得距离

import requests
import math
from BeautifulSoup import BeautifulSoup

def distance_on_unit_sphere(lat1, long1, lat2, long2):
    "src: http://www.johndcook.com/python_longitude_latitude.html"
    degrees_to_radians = math.pi/180.0

    phi1 = (90.0 - lat1)*degrees_to_radians
    phi2 = (90.0 - lat2)*degrees_to_radians
    theta1 = long1*degrees_to_radians
    theta2 = long2*degrees_to_radians
    cos = (math.sin(phi1)*math.sin(phi2)*math.cos(theta1 - theta2) + 
           math.cos(phi1)*math.cos(phi2))
    arc = math.acos( cos )
    return arc * 3960 # To get the distance in kilometers, multiply by 6373 instead

def main():
    r = requests.get('http://www.freemaptools.com/ajax/getaandb.php?a=Sydney_Australia&b=Melbourne_Australia&c=1317')
    xml = BeautifulSoup(r.text)

    lat1 = float(xml.markers.findAll('marker')[0]['lat']);
    lng1 = float(xml.markers.findAll('marker')[0]['lng']);
    lat2 = float(xml.markers.findAll('marker')[1]['lat']);
    lng2 = float(xml.markers.findAll('marker')[1]['lng']);

    print distance_on_unit_sphere(lat1, lng1, lat2, lng2)

if __name__ == '__main__':
    main()