从废弃的rails.com解码折线(编码的GeoJSON)

时间:2019-04-19 13:58:43

标签: python beautifulsoup escaping google-polyline

我想从Alabama开始,从废弃的rails.com中提取铁路线。我正在抓拍beautifulsoup,然后尝试对折线编码的LineStrings进行解码:

const regex = /^#([a-z]+)[a-z0-9]*(?!\S)/i;
const str = `#type1 this is the text of the note`;
console.log(str.match(regex)[1]);

但是它在名为Alabama_Tennessee_and_Northern_Railroad的第一条折线上失败。我已将python的 [“ 替换为'

import requests
from bs4 import BeautifulSoup
#from pypi / https://github.com/hicsail/polyline
import polyline

state = 'Alabama'
page = requests.get('http://www.abandonedrails.com/'+state)
soup = BeautifulSoup(page.text, 'lxml')
select = soup.find_all('section', class_="route")

for s in select:
    filename = s.attrs['data-filename']
    print(filename)

    encoded_pline = s.attrs['data-routes']
    print(encoded_pline)

    poly = polyline.decode(encoded_pline)
    print(poly)

未转义的逗号或引号是否存在Python问题?还是因为其他原因而失败?

2 个答案:

答案 0 :(得分:0)

data-routes属性中编码的信息为JSON格式,因此您首先需要使用Python JSON库对其进行转换。

此输出是一个段列表,如果将这些段单独传递到您的库,则应该为您提供所需的内容:

import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
#from pypi / https://github.com/hicsail/polyline
import polyline
import json

state = 'Alabama'
page = requests.get('http://www.abandonedrails.com/'+state)
soup = BeautifulSoup(page.text, 'lxml')
select = soup.find_all('section', class_="route")

for s in select:
    print(s['data-filename'])
    encoded_pline = json.loads(s['data-routes'])

    for segment in encoded_pline:
        poly = polyline.decode(segment)
        lats, longs = list(zip(*poly))
        plt.plot(longs, lats)

plt.show() 

这将使用Matplotlib为您提供输出,如下所示: matplotlib plot of polylines

使用Python 3.6.7测试

答案 1 :(得分:0)

您的折线字符串是针对JavaScript编码的。为了能够在Python中使用它,您应该删除转义。使用解码路径()作为解码功能:

output = str(decodePath(entry.replace('\\\\', '\\')))