我想从Alabama开始,从废弃的rails.com中提取铁路线。我正在抓拍beautifulsoup,然后尝试对折线编码的LineStrings进行解码:
const regex = /^#([a-z]+)[a-z0-9]*(?!\S)/i;
const str = `#type1 this is the text of the note`;
console.log(str.match(regex)[1]);
但是它在名为Alabama_Tennessee_and_Northern_Railroad的第一条折线上失败。我已将python的 [“ 替换为':
import requests
from bs4 import BeautifulSoup
#from pypi / https://github.com/hicsail/polyline
import polyline
state = 'Alabama'
page = requests.get('http://www.abandonedrails.com/'+state)
soup = BeautifulSoup(page.text, 'lxml')
select = soup.find_all('section', class_="route")
for s in select:
filename = s.attrs['data-filename']
print(filename)
encoded_pline = s.attrs['data-routes']
print(encoded_pline)
poly = polyline.decode(encoded_pline)
print(poly)
未转义的逗号或引号是否存在Python问题?还是因为其他原因而失败?
答案 0 :(得分:0)
data-routes
属性中编码的信息为JSON格式,因此您首先需要使用Python JSON库对其进行转换。
此输出是一个段列表,如果将这些段单独传递到您的库,则应该为您提供所需的内容:
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
#from pypi / https://github.com/hicsail/polyline
import polyline
import json
state = 'Alabama'
page = requests.get('http://www.abandonedrails.com/'+state)
soup = BeautifulSoup(page.text, 'lxml')
select = soup.find_all('section', class_="route")
for s in select:
print(s['data-filename'])
encoded_pline = json.loads(s['data-routes'])
for segment in encoded_pline:
poly = polyline.decode(segment)
lats, longs = list(zip(*poly))
plt.plot(longs, lats)
plt.show()
使用Python 3.6.7测试
答案 1 :(得分:0)
您的折线字符串是针对JavaScript编码的。为了能够在Python中使用它,您应该删除转义。使用解码路径()作为解码功能:
output = str(decodePath(entry.replace('\\\\', '\\')))