这是我的主要蜘蛛bathuni.py
from scrapy.spider import Spider
from scrapy.selector import Selector
from bathUni.items import BathuniItem
class bathuni(Spider):
name = "bathU"
allowed_domains = ["http://international.southwales.ac.uk/"]
start_urls= ["http://international.southwales.ac.uk/country/argentina/en/",
"http://international.southwales.ac.uk/country/france/en/",
"http://international.southwales.ac.uk/country/australia/en/"]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//div[@class="factsheet"]/ul/li')
for site in sites:
country = site.xpath('text()').extract()
return country
我的items.py如下
from scrapy.item import Item, Field
class BathuniItem(Item):
Country = Field()
我用来输出到csv文件的命令是
scrapy crawl bathU -o countries.csv -t csv
我的输出文件始终为空。任何帮助,将不胜感激。感谢。
答案 0 :(得分:1)
将return country
更改为yield BathuniItem(Country=country)
这解决了两个问题:
ERROR: Spider must return Request, BaseItem or None, got 'unicode'