Question

import scrapy

class ScrapeMovies(scrapy.Spider):
    name='conference-papers'
    start_urls = [
        'http://archive.bridgesmathart.org/2015/index.html'
    ]
    def parse(self, response):
        for entry in response.xpath('//div[@class="col-md-9"]'):
            yield{
                'type': entry.xpath('.//div[@class="h4 alert alert-info"]/text()').extract(),
                'title': entry.xpath('.//span[@class="title"]/text()').extract(),
                'authors': entry.xpath('.//span[@class="authors"]/text()').extract()
            }

拥有以下代码我想要清除列出的每个出版物的类型，标题和作者。然而，当我运行这个我有类型，在一行中，标题用换行符分隔，作者最后用一行分隔。

如何将这三个值结合在一起？解决这个问题的最佳方法是什么？

这里你摘录了我要废弃的html代码：

BTW：如果你投票，请解释原因。我很好奇。

Answer 1

您需要将这些值连接起来：https://stackoverflow.com/a/19418858/6668185

然后你需要获得每本书的前一个div并获得如下值：https://stackoverflow.com/a/9857809/6668185

我会在一秒钟内完善这个答案，并提供准确的解决方案。

<强> UPDATE /改进

试试这个：

shuffle()

我没有测试它，但我认为它应该可以正常工作。

用scrapy刮擦 - 合并字段

1 个答案: