我正在尝试从网站上抓取一些数据,但它并没有在csv文件中保存任何内容
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
name = shoes.css('a.store-name::text').extract()
review_count = shoes.css('a.rating-info rtl-mode::text').extract()
price = shoes.css('span.price-current::text').extract()
image_link = shoes.css('.place-container img::attr(src)').extract()
with open('urls.csv', 'w') as f:
for u in name| review_count| price| image_link:
f.write(u + "\n")
答案 0 :(得分:1)
正如Win Hermanans所说,您绝对应该使用feed Exports。
就像
一样简单def parse(self, response):
name = shoes.css('a.store-name::text').extract()
review_count = shoes.css('a.rating-info rtl-mode::text').extract()
price = shoes.css('span.price-current::text').extract()
image_link = shoes.css('.place-container img::attr(src)').extract()
for i in range(len(name)):# we are looping beause extract() will give you a list and we want to get all the elements in seperate rows.
yield{'name':name[i],
'review_count':review_count[i],
'price':price[i],
'image_link':image_link[i],
}
并且您可以在运行搜寻器时简单地通过 -o myData.csv 。
scrapy crawl mycrawler -o myData.csv
您甚至可以获取json和xml。
scrapy crawl mycrawler -o myData.json
scrapy crawl mycrawler -o myData.xml
现在,您应该在项目文件夹中看到一个 myData.csv 及其所有数据。
不过,为什么在上面的csv文件中没有填充数据。这是因为您要覆盖每一列。创建urls.csv
文件时,您以写操作打开它。因此,所有内容都会被覆盖。您可以尝试使用append参数。
def parse(self, response):
name = shoes.css('a.store-name::text').extract()
review_count = shoes.css('a.rating-info rtl-mode::text').extract()
price = shoes.css('span.price-current::text').extract()
image_link = shoes.css('.place-container img::attr(src)').extract()
for i in range(len(name)):# we are looping beause extract() will give you a list and we want to get all the elements in seperate rows.
with open(filename,'a',newline='') as csvf:
csv_writer = csv.writer(csvf)
csv_writer.writerow([name[i],review_count[i],price[i],image_link[i]])
答案 1 :(得分:0)
与其将结果写入解析中的文件,请尝试使用以下方法: -创建一个项目(在items.py中):
import scrapy
class ShoeItem(scrapy.Item):
name = Field()
review_count = Field()
price = Field()
image_link = Field()
from ..items import ShoeItem
def parse(self, response):
name = shoes.css('a.store-name::text').extract()
review_count = shoes.css('a.rating-info rtl-mode::text').extract()
price = shoes.css('span.price-current::text').extract()
image_link = shoes.css('.place-container img::attr(src)').extract()
item = ShoeItem()
item['name'] = name
item['review_count'] = review_count
item['price'] = price
item['image_link'] = image_link
yield item
scrapy crawl (spidername) -o urls.csv