我正在研究scrapy爬虫,这个问题让我感到困扰,因为我已经被困了好几天了。
当我使用"?"这个占位符功能正常工作而不是"%s"对于SQLite数据库。但在使用"?"当数据库切换到MySQL时,它显示:
" TypeError:在字符串格式化期间并非所有参数都被转换 "
即使我付出了很多努力来修改代码并更改占位符(据说是?),它仍然显示:
" query = query%self._escape_args(args,conn)ValueError:不支持的格式字符',' "
更具体地说:
Traceback (most recent call last):
File "/usr/lib64/python3.4/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/ec2-user/lulu_testing/get_download_file/hello_scrapy/hello/hello/pipelines.py", line 42, in process_item
self.cur.execute(insert_query, insert_values)
File "/usr/lib/python3.4/dist-packages/pymysql/cursors.py", line 163, in execute
query = self.mogrify(query, args)
File "/usr/lib/python3.4/dist-packages/pymysql/cursors.py", line 142, in mogrify
query = query % self._escape_args(args, conn)
ValueError: unsupported format character ',' (0x2c) at index 94
mysql版本的pipline
import pymysql
import scrapy
from hello.items import HelloItem
class HelloPipeline(object):
def __init__(self):#
self.conn = pymysql.connect(host="localhost", port=3306, user="root", passwd="lulu", db="test", charset="utf8", use_unicode=True)
self.cur = self.conn.cursor()
self.cur.execute("drop table IF EXISTS test;")
self.conn.commit()
self.cur.execute("create table if not EXISTS table_test_4(test0 text, test1 text, test2 text, test3 text,test4 text, test5 text, test6 text, test7 text, test8 text, test9 text);")
self.conn.commit()
#pass
def process_item(self, item, spider):#
col = ",".join(item.keys())
placeholders = ",".join(len(item) * "%s")
insert_query = "INSERT INTO test_table_4({0}) VALUES({1});".format(col,placeholders)
insert_values = tuple(item.values())
self.cur.execute(insert_query, insert_values)
return item
def close_spider(self, spider):#
self.cur.close()
self.conn.close()
#pass
SQLite版本(我使用的是b4)
import sqlite3
import scrapy
from hello.items import HelloItem
class HelloPipeline(object):
def open_spider(self, spider):#
self.conn = sqlite3.connect("test_database_ver_2018_03_31.sqlite")
self.cur = self.conn.cursor()
self.cur.execute("create table if not exists test_table(test0 text, test1 text, test2 text, test3 text,test4 text, test5 text, test6 text, test7 text, test8 text, test9 text);")
#pass
def close_spider(self, spider):#
self.conn.commit()
self.conn.close()
#pass
def process_item(self, item, spider):#
col = ",".join(item.keys())
placeholders = ",".join(len(item) * "?")
sql = "insert into test_table({}) values({})"
self.cur.execute(sql.format(col, placeholders), tuple(item.values()))
return item
主要scrapy爬虫程序的数据设置
testitem = HelloItem()
testitem["test0"] = house_detail.select(".houseInfoTitle")[0].text
testitem["test1"] = house_detail.select(".pageView")[0].text
testitem["test2"] = house_detail.select(".detailInfo")[0].text
testitem["test3"] = house_detail.select(".houseIntro")[0].text
testitem["test4"] = house_detail.select(".lifeBox")[0].text
testitem["test5"] = house_detail.select(".labelList")[0].text
testitem["test6"] = house_detail.select(".facility")[0].text
testitem["test7"] = str(house_detail.select(".userInfo"))
testitem["test8"] = str(house_detail.select(".banner"))
testitem["test9"] = str(house_detail.select("#show"))
return testitem
项目设置
import scrapy
class HelloItem(scrapy.Item):
test0 = scrapy.Field()
test1 = scrapy.Field()
test2 = scrapy.Field()
test3 = scrapy.Field()
test4 = scrapy.Field()
test5 = scrapy.Field()
test6 = scrapy.Field()
test7 = scrapy.Field()
test8 = scrapy.Field()
test9 = scrapy.Field()
答案 0 :(得分:0)
问题是这一行:
placeholders = ",".join(len(item) * "%s")
没有做你期望的事。
>>> item = {'a': 1, 'b': 2, 'c': 3}
>>> placeholders = ",".join(len(item) * "%s")
>>> print(placeholders)
%,s,%,s,%,s
",".join(len(item) * "%s")
执行两项操作 - 计算len(item) * "%s"
,然后将结果与','
结合。
len(item) * '%s'
的结果是字符串(或 iterable )'%s%s%s'
。 str.join(iterable)
返回一个字符串,其中包含 iterable separated by the string that provides the method的所有元素。所以调用的结果是
','.join('%s%s%s')
是
'%,s,%,s,%,s'
,而不是'%s,%s,%s'
你想做
>>> ",".join(len(item) * ["%s"])
'%s,%s,%s'
或
>>> ",".join('%s' for _ in item)
'%s,%s,%s'
以便str.join
在'%s'
字符串的可迭代字符上运行,而不是像'%s%s%s'
这样的单个字符串。
`