Question

我是scrapy的初学者，但是学习。我一直在解析this page。并试图刮掉页面上的地址。

我在scrapy shell中完成了这个，所以我从：

开始

scrapy shell https://www.marksandspencer.com/MSStoreDetailsView?storeId=10151&langId=-24&SAPStoreId=6952

哪个工作正常。然后我尝试解析地址：

response.xpath('//li[@class="address"]/text()').extract()

但我的输出如下：

[＆＃39; \ n \ t \ t＆＃39;，＆＃39; \ n \ t \ t \ n \ t \ t \ t＆＃39;]

为什么我无法看到页面上显示的地址：

BELFAST ABBEY CENTER，1 Old Glenmount Road Newtonabbey，Newton Abbey，BT36 7DN

如何获取此地址？我感谢任何花时间回复的人。

Answer 1

关于如何处理此问题，有几个错误：

使用scrapy shell时，您必须使用""包围该网址，因为终端可以将其解释为多个进程，因为网址中包含字符&：< / p>
```
scrapy shell "https://www.marksandspencer.com/MSStoreDetailsView?storeId=10151&langId=-24&SAPStoreId=6952"
```
您的xpath不正确，因为/text()您获取了该特定代码的文字，并且li实际上并未包含您想要的信息。包含该文本的标记位于li的子项上，因此您可以使用：
```
response.xpath('//li[@class="address"]//text()').extract()
```
或
```
response.xpath('//li[@class="address"]/p/text()').extract()
```