只有父项存在时才需要将父名称加入子行,否则不要将父名称加入行。
结果我需要有或没有父名称的子行。
我该如何实现?
SELECT t.*, cat.name AS cat_name
FROM products AS t
INNER JOIN category AS cat
ON category_id=cat.id
WHERE t.is_public!=2
答案 0 :(得分:2)
只需将from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
import scrapy
from scrapy.contrib.linkextractors import LinkExtractor
class CraigslistSampleItem(scrapy.Item):
title = scrapy.Field()
url = scrapy.Field()
date = scrapy.Field()
description=scrapy.Field()
class SiteSpider(CrawlSpider):
name = "newscrap"
#download_delay = 2
allowed_domains = ['example.com']
start_urls = ['http://example.com/page/1']
items = {}
def parse(self, response):
sel = Selector(response)
#requests =[]
brands = sel.xpath("//div[@class='thumb']")
for brand in brands:
item = CraigslistSampleItem()
url = brand.xpath("./a/@href")[0].extract()
item['url'] = brand.xpath("./a/@href")[0].extract()
item ["title"] = brand.xpath("./a/@title").extract()
item ["date"] = brands.select("//span/text()").extract()[counter]
counter=counter+1
request = Request(url,callback=self.parse_model, meta={'item':item})
yield request
def parse_model(self, response):
sel = Selector(response)
models = sel.xpath("//*[@id='blocks-left']/div[1]/div/div[5]/p")
for model in models:
item = CraigslistSampleItem(response.meta["item"])
item ['description'] = model.xpath("//*[@id='blocks-left']/div[1]/div/div[5]/p")[0].extract()
yield item
替换为LEFT JOIN
,所有产品都将包含在结果集中,var myNewFileObject = DriveApp.createFile("myXmlFile.xml", xmlContent, "text/xml");
为INNER JOIN
,用于没有父级/类别的产品。< / p>