Question

我正在尝试构建一个小型刮板，将一些新闻主题分类为一个爱好项目（我不是专业的开发人员或技术人员，并且我是OOP和Python的初学者，我对php和arduino编程语言）。我设法了解了草率的和不完整的mysql管道。如果我用简单的字符串替换item ['titlu']和item ['articol']，数据库将被填充。我搜索并阅读了大量信息，但我完全无法解决我的问题。我想item ['titlu']和item ['articol']是某种数组类型或mysql不喜欢的类型。我将发布代码和错误以寻求帮助。代码注释行是我解决问题的一些尝试 mysql数据库表是：

CREATE TABLE `ziare_com` (
  `id` int(11) NOT NULL,
  `titlu` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `articol` varchar(20000) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL

我还尝试将titlu和articol文本类型更改为varchar。我故意让这样的表格示例（带有一个filde文本和其他varchar）让您知道我尝试了哪些设置。

谢谢：

蜘蛛：

  def parse(self, response):
     #pass
     for link in response.xpath('//h2[@class="titlu_sec"]/a/@href').extract():
         yield response.follow(link, callback=self.parse_detail)
 def parse_detail(self, response):
     item = RezultScrap()
     #for quote in response.css('div.quote')
     item['titlu'] = response.css(".titlu_stire::text").extract()
     item['articol'] = response.css(".descriere_main::text").extract()
     return item

         #item['titlu'] = response.xpath('//div[contains(@id, "interior_left")]/h1/text()').extract_first()
         #item['articol'] = response.xpath('//div[contains(@id, "content_font_resizable")]//text()').extract()
     #titlu = response.css(".titlu_stire::text").extract()
     #articol = response.css(".descriere_main::text").extract()
         #yield item
     #titul1 = re.sub(r"['\\]","", titlu)
     #articol1 =  re.sub(r"['\\]","", articol)


    # yield {
     #        'titlu':titlu,
      #       'articol':articol
             #titlu,
             #articol
     #}

items.py：

import scrapy


 class FirstItem(scrapy.Item):
     # define the fields for your item here like:
     # name = scrapy.Field()
     pass
 class RezultScrap(scrapy.Item):
     titlu=scrapy.Field()
     articol=scrapy.Field()

pipelines.py：

import pymysql
 #from scrapy.exceptions import DropItem
 #pmysql.escape_string("'")
 from first.items import RezultScrap


 class Mysql(object):
         def __init__(self):
             self.connection = pymysql.connect("localhost","xxxxxx","xxxx","ziare")
             self.cursor = self.connection.cursor()


         def process_item(self, item, spider):
             #titlu1 = [pymysql.escape_string(item['titlu'])]
             #articol1 = [pymysql.escape_string(item['articol'])]
             #self.cursor.execute
             #query ="INSERT INTO ziare_com (titlu, articol) VALUES (%s, %s)"
             query ="INSERT INTO ziare_com (titlu, articol) VALUES (%s, %s) % (item['titlu'], item['articol'])"
             self.cursor.execute(query)
             #self.cursor.executemany(query)
             self.connection.commit()
             #return item

         def close_spider(self, spider):
             self.cursor.close()
             self.connection.close()

错误如下：

这是我使用self.cursor.executemany（query）

TypeError：executemany（）缺少1个必需的位置参数： 'args'

这是我使用 self.cursor.execute（query）的时候我明白了：

pymysql.err.ProgrammingError：（1064，“您的SQL错误句法;检查与您的MySQL服务器版本相对应的手册为在'％s，％s）％（item ['titlu']附近使用正确的语法，第1行的item ['articol']）'“）

Answer 1

process_item中的正确代码为

def process_item(self, item, spider):

    query ="INSERT INTO ziare_com (titlu, articol) VALUES (%s, %s)"
    self.cursor.execute(query, [ item['titlu'], item['articol'] ])
    self.connection.commit()
    return item

您只需要为值写%s，然后在execute方法中将它们作为列表（数组）传递

此外，您应该了解字符串格式

您正在做

"INSERT INTO ziare_com (titlu, articol) VALUES (%s, %s) % (item['titlu'], item['articol'])"

整个都是一个字符串，您根本不会将值传递给您的字符串

这是更正的声明

"INSERT INTO ziare_com (titlu, articol) VALUES (%s, %s) " % ((item['titlu'], item['articol']))

将scrapy结果传递给mysql数据库

1 个答案: