Question

我正在使用scrapy从网站中提取列表。如果我使用以下代码：

response.xpath('//*[@id="mainframe"]/div/div[1]/div[1]/span[2]/text()').extract()

返回

[u'A sample String with dot dot dot in the end...',
u'And some other string ...',
u'Another similar string with dots in the end...',
u'Can some one help with preventing my string from being trun...']

然而，当我使用没有“/ text（）”时，如下面的代码所示，即

response.xpath('//*[@id="mainframe"]/div/div[1]/div[1]/span[2]').extract()

我得到以下输出：

[u'<span title= A sample String with dot dot dot in the end and plus something> some_text </span>',
 u'<span title=And some other string and plus something> some_text </span>',
 u'<span title=Another similar string with dots in the end and plus something> some_text </span>',
 u'<span title=Can some one help with preventing my string from being truncated and plus something> some_text </span>']

如何获得没有这些点的完整字符串？

Answer 1

您应该尝试以下代码：

item['name'] = site.xpath('a[1]/text()').re('(\w+)')

如何在scrapy中获得完整的字符串

1 个答案: