Question

我在抓这个页面：

https://en.wikipedia.org/wiki/Water_Tower_Place

我需要那里出现的坐标，纬度和经度

我尝试：

scrapy shell https://en.wikipedia.org/wiki/Water_Tower_Place


response.xpath('//*[@id="coordinates"]/span/span/a/span[1]/span/span[1]')

但获得一个空列表作为回复

我可以使用正则表达式获取它

re.findall('latitude([^<]+)',str(response.body))

但它有特殊字符，但我想有一种简单的方法可以直接获取数字而无需处理特殊字符

['">41\xc2\xb053\xe2\x80\xb252.5\xe2\x80\xb3N']]

编辑：

我的不好，当我打印它时我得到了纬度，

41°53'52.5“N

无论哪种方式，我都很有兴趣知道如何在没有正则表达式的情况下获得价值

Answer 1

我会依赖于特定的latitude和longitude类：

$ scrapy shell https://en.wikipedia.org/wiki/Water_Tower_Place
>>> print response.css(".geo-dms .latitude::text").extract_first()
41°53′52.5″N
>>> print response.css(".geo-dms .longitude::text").extract_first()
87°37′20.5″W

Answer 2

如果要使用xpath，可以使用：

response.xpath('//span[@class="latitude"]/text()').extract()[0]

和

response.xpath('//span[@class="longitude"]/text()').extract()[0]

从维基百科页面

2 个答案: