我是Web抓取的新手,我想从通过网址访问的<div>
标记中提取坐标。有一个URL列表,我要从中提取坐标并将其保存在CSV文件中。
<div class="single-view-data-row">
<div class="single-view-data-title">Coordinates</div>
<div class="single-view-data-get">
17.009164 N, -90.309259 E<br/><a href="http://geographiclib.sourceforge.net/cgi-bin/GeoConvert?input=17.009164+-90.309259" target="_blank">»» UTM / MGRS</a></div></div></div>
感谢帮助!
答案 0 :(得分:0)
要从此HTML文本中提取链接和坐标,可以使用以下脚本:
from bs4 import BeautifulSoup
txt = ''' <div class="single-view-data-row">
<div class="single-view-data-title">Coordinates</div>
<div class="single-view-data-get">
17.009164 N, -90.309259 E<br/><a href="http://geographiclib.sourceforge.net/cgi-bin/GeoConvert?input=17.009164+-90.309259" target="_blank">»» UTM / MGRS</a></div></div></div>
'''
soup = BeautifulSoup(txt, 'html.parser')
link = soup.select_one('.single-view-data-get a')['href']
coords = soup.select_one('.single-view-data-get').find_next(text=True).split(',')
print(link)
print(coords[0].strip())
print(coords[1].strip())
打印:
http://geographiclib.sourceforge.net/cgi-bin/GeoConvert?input=17.009164+-90.309259
17.009164 N
-90.309259 E