我刚认识Python和scrapy
。我试图从多个XML文档中提取数据。我在这里找到了XML,它们的范围从<strong>CourseId=1</strong>
一直到<strong>CourseID=4500</strong>
:
示例1:
我的scrapy代码如下。当我运行它时,我得到一个TypeError: unbound method body_as_unicode() must be called with the XMLResonse instance as first argument
。
from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import XmlXPathSelector
from myProject.items import RTOData
from scrapy.http import Request
from scrapy.http import XmlResponse
class myProjectSpider(CrawlSpider):
name = 'myProject'
allowed_domains = ['myskills.gov.au']
start_urls = ['http://www.myskills.gov.au/DesktopModules/Services/api/RegisteredTrainers/GetOfferedTrainers?LocationID=0&Distance=25&IsExplicit=false&CourseId=1']
def parse_start_url(self, response):
x = XmlXPathSelector(XmlResponse)
Latitude = x.select('//ArrayOfRegisteredTrainerLocationOfferedItem/RegisteredTrainerLocationOfferedItem/Latitude').extract()
Longitude = x.select('//ArrayOfRegisteredTrainerLocationOfferedItem/RegisteredTrainerLocationOfferedItem/Longitude').extract()
RTOCode = x.select('//ArrayOfRegisteredTrainerLocationOfferedItem/RegisteredTrainerLocationOfferedItem/RTOCode').extract()
SiteName = x.select('//ArrayOfRegisteredTrainerLocationOfferedItem/RegisteredTrainerLocationOfferedItem/SiteName').extract()
有人可以告诉我,我是否在正确的道路上?