我的解析器完成数据爬网后如何调用writeXML?目前我可以看到数据抓取但看不到输出文件。我试着在writeXML下打印没有输出。
以下是我的代码:
class FriendSpider(BaseSpider):
# identifies of the Spider
name = "friend"
count = 0
allowed_domains = ["example.com.us"]
start_urls = [
"http://example.com.us/biz/friendlist/"
]
def start_requests(self):
for i in range(0,1722,40):
yield self.make_requests_from_url("http://example.com.us/biz/friendlist/?start=%d" % i)
def parse(self, response):
response = response.replace(body=response.body.replace('<br />', '\n'))
hxs = HtmlXPathSelector(response)
sites = hxs.select('//ul/li')
items = []
for site in sites:
item = Item()
self.count += 1
item['id'] = str(self.count)
item['name'] = site.select('.//div/div/h4/text()').extract()
item['address'] = site.select('h4/span/text()').extract()
item['review'] = ''.join(site.select('.//div[@class="review"]/p/text()').extract())
item['birthdate'] = site.select('.//div/div/h5/text()').extract()
items.append(item)
return items
def writeXML(self, items):
root = ET.Element("Test")
for item in items:
item= ET.SubElement(root,'item')
item.set('id', item['id'])
address= ET.SubElement(item, 'address')
address.text = item['address']
user = ET.SubElement(item, 'user')
user.text = item['user']
birthdate= ET.SubElement(item, 'birthdate')
birthdate.text = item['birthdate']
review = ET.SubElement(item, 'review')
review.text = item['review']
# wrap it in an ElementTree instance, and save as XML
file = open("out.xml", 'w')
tree = ET.ElementTree(root)
tree.write(file,xml_declaration=True,encoding='utf-8',method="xml")
答案 0 :(得分:2)
要使用内置XML导出器输出,请尝试以下命令:
scrapy crawl friend -o items.xml -t xml
如果输出不符合您的喜好,那么您可以尝试使用XMLExporter class作为基础来创建自己的导出器。