我正在使用scrapy的导出到CSV,但有时我正在抓取的内容包含我不想要的引号和逗号。
在输出到CSV之前,如何在没有任何'的情况下替换这些字符?
继承我的CSV,其中包含strTitle列中不需要的字符:
strTitle,strLink,strPrice,strPicture
"TOYWATCH 'Metallic Stones' Bracelet Watch, 35mm",http://shop.nordstrom.com/s/toywatch-metallic-stones-bracelet-watch-35mm/3662824?origin=category,0,http://g.nordstromimage.com/imagegallery/store/product/Medium/11/_8412991.jpg
继承我的代码替换行上的哪些错误:
def parse(self, response):
hxs = Selector(response)
titles = hxs.xpath("//div[@class='fashion-item']")
items = []
for titles in titles[:1]:
item = watch2Item()
item ["strTitle"] = titles.xpath(".//a[@class='title']/text()").extract()
item ["strTitle"] = item ["strTitle"].replace("'", '').replace(",",'')
item ["strLink"] = urlparse.urljoin(response.url, titles.xpath("div[2]/a[1]/@href").extract()[0])
item ["strPrice"] = "0"
item ["strPicture"] = titles.xpath(".//img/@data-original").extract()
items.append(item)
return items
答案 0 :(得分:1)
修改强>
尝试在替换之前添加此行。
item["strTitle"] = ''.join(item["strTitle"])
strTitle = "TOYWATCH 'Metallic Stones' Bracelet Watch, 35mm"
strTitle = strTitle.replace("'", '').replace(",",'')
strTitle == "TOYWATCH Metallic Stones Bracelet Watch 35mm"
答案 1 :(得分:1)
最终解决方案是:
item["strTitle"] = [titles.xpath(".//a[@class='title']/text()").extract()[0].replace("'", '').replace(",",'')]