Question

我想从Alabama开始，从废弃的rails.com中提取铁路线。我正在抓拍beautifulsoup，然后尝试对折线编码的LineStrings进行解码：

const regex = /^#([a-z]+)[a-z0-9]*(?!\S)/i;
const str = `#type1 this is the text of the note`;
console.log(str.match(regex)[1]);

但是它在名为Alabama_Tennessee_and_Northern_Railroad的第一条折线上失败。我已将python的 [“ 替换为'：

import requests
from bs4 import BeautifulSoup
#from pypi / https://github.com/hicsail/polyline
import polyline

state = 'Alabama'
page = requests.get('http://www.abandonedrails.com/'+state)
soup = BeautifulSoup(page.text, 'lxml')
select = soup.find_all('section', class_="route")

for s in select:
    filename = s.attrs['data-filename']
    print(filename)

    encoded_pline = s.attrs['data-routes']
    print(encoded_pline)

    poly = polyline.decode(encoded_pline)
    print(poly)

未转义的逗号或引号是否存在Python问题？还是因为其他原因而失败？

Answer 1

data-routes属性中编码的信息为JSON格式，因此您首先需要使用Python JSON库对其进行转换。

此输出是一个段列表，如果将这些段单独传递到您的库，则应该为您提供所需的内容：

import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
#from pypi / https://github.com/hicsail/polyline
import polyline
import json

state = 'Alabama'
page = requests.get('http://www.abandonedrails.com/'+state)
soup = BeautifulSoup(page.text, 'lxml')
select = soup.find_all('section', class_="route")

for s in select:
    print(s['data-filename'])
    encoded_pline = json.loads(s['data-routes'])

    for segment in encoded_pline:
        poly = polyline.decode(segment)
        lats, longs = list(zip(*poly))
        plt.plot(longs, lats)

plt.show()

这将使用Matplotlib为您提供输出，如下所示：

使用Python 3.6.7测试

Answer 2

您的折线字符串是针对JavaScript编码的。为了能够在Python中使用它，您应该删除转义。使用解码路径（）作为解码功能：

output = str(decodePath(entry.replace('\\\\', '\\')))

从废弃的rails.com解码折线（编码的GeoJSON）

2 个答案: