我有这个输出端口坐标的代码:
import urllib
import urllib.request as request
import re
a = input("What country is your port in?: ")
b = input("What is the name of the port?: ")
url = "http://ports.com/"
country = ["united-kingdom","greece"]
ports = ["port-of-eleusis","portsmouth-continental-ferry-port","poole-harbour"]
totalurl = "http://ports.com/" + a + "/" + b + "/"
htmlfile = urllib.request.urlopen(totalurl)
htmltext = htmlfile.read()
regex = '<strong>Coordinates:</strong>(.*?)</span>'
pattern = re.compile(regex)
with urllib.request.urlopen(totalurl) as response:
html = htmltext.decode()
num = re.findall(pattern, html)
print(num)
输出正确且可读但我需要坐标格式为:39°09'24.6''N 175°37'55.8''W而不是:
>>> [' 50°48′41.04″N 1°5′31.31″W']
答案 0 :(得分:0)
您的错误是由于HTML内部使用这些代码来显示特定的unicode字符,而python则没有。要解决此问题,请将print(num)
替换为print(list(i.replace('°', "°").replace('′',"′").replace('″',"″") for i in num))
这基本上将°
替换为°
,′
替换为′
,″
替换为″
。
>>> print(list(i.replace('°', "°").replace('′',"′").replace('″',"″") for i in num))
[" 50°48′41.04″N 1°5′31.31″W"]
>>>