我已设法使用以下程序提取一些数据。我尝试使用以下命令scrapy crawl jp -t csv -o extract_jp.csv --loglevel=INFO.
导出到.csv文件但是,如果我没有在程序中包含.replace('\n','').replace('\r','')
,那么内容,即' question_content'不能写入同一个单元格但分成不同的单元格。
如何让内容保持其换行格式但存在于同一单元格中?
import scrapy
class JPItem(scrapy.Item):
best_answer = scrapy.Field()
question_content = scrapy.Field()
question_title = scrapy.Field()
class JPSpider(scrapy.Spider):
name = "jp"
allowed_domains = ['detail.chiebukuro.yahoo.co.jp']
start_urls = [
'https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q' + str(x)
for x in range (12174460000,12174470000)
]
def parse(self, response):
item = JPItem()
item['question_title'] = response.css("div.mdPstd.mdPstdQstn.sttsRslvd.clrfx div.ttl h1::text").extract_first()
item['question_content'] = ''.join([i for i in response.css("div.mdPstd.mdPstdQstn.sttsRslvd.clrfx div.ptsQes p::text").extract()])
item['best_answer'] = ''.join([i for i in response.css("div.mdPstd.mdPstdBA.othrAns.clrfx div.ptsQes p.queTxt::text").extract()])
yield item
修改1
LibreOffice Display after amendment of codes
修改2
答案 0 :(得分:0)
可能与您的CSV阅读软件有关。
我能够在LibreOffice中提取和显示就好了。
这是我使用的蜘蛛(改编自你的蜘蛛):
import scrapy
class JPItem(scrapy.Item):
question_title = scrapy.Field()
question_content = scrapy.Field()
best_answer = scrapy.Field()
class JPSpider(scrapy.Spider):
name = "jp"
allowed_domains = ['detail.chiebukuro.yahoo.co.jp']
start_urls = [
'https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q12174467757',
'https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q10174455955',
'https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q14174286904',
]
def parse(self, response):
item = JPItem()
item['question_title'] = response.css("div.mdPstd.mdPstdQstn.sttsRslvd.clrfx div.ttl h1::text").extract_first()
item['question_content'] = ''.join([i for i in response.css("div.mdPstd.mdPstdQstn.sttsRslvd.clrfx div.ptsQes p::text").extract()])
item['best_answer'] = ''.join([i for i in response.css("div.mdPstd.mdPstdBA.othrAns.clrfx div.ptsQes p.queTxt::text").extract()])
yield item
我用$ scrapy runspider myspider.py -o test.csv
这就是LibreOffice所显示的内容:http://imgur.com/a/6q205