我真的有最认真的时间试图辨别我收到的错误消息的原因。
我正在编写一个网络抓取器,该抓取器使用Python和BeautifulSoup进行抓取,并使用Peewee进行数据库交互,将Procycing Stats中的数据抓取到MySQL数据库中。网页抓取工具可以正常运行,但是在将数据插入MySQL表时遇到了一些麻烦。
首先,我使用peewee的create_tables()
函数在空数据库中创建了表。在下面,我粘贴了我的Peewee模型的代码,该代码包含在我称为peewee_lib.py
的文件中。
from peewee import *
from mysql_login_info import *
results_database = MySQLDatabase(mysql_db_name, user=mysql_uname, password=mysql_pw, host='localhost')
class BaseModel(Model):
class Meta:
database = results_database
class Rider(BaseModel):
pcsid = IntegerField()
name = CharField()
class Race(BaseModel):
name = CharField()
class Result(BaseModel):
name = CharField()
year = IntegerField()
date = DateField()
position = IntegerField()
points_pcs = IntegerField()
race = ForeignKeyField(Race, backref='results')
rider = ForeignKeyField(Rider, backref='results')
接下来,我使用文件scrape_to_peewee.py
创建类,这些类将我的抓取库scraper_lib.py
和前面提到的peewee库peewee_lib.py
的类定义“绑定”在一起。
这是scrape_to_peewee.py
中的代码:
import scraper_lib as pylib
import peewee_lib as pw
class Sheet_bind:
def __init__(self, rider_obj, sheet):
self.year = sheet.year
self.rider = sheet.rider
self.rows = []
for row in sheet.rows:
if row.row_type == "tour_header":
pass
else:
temp_query = pw.Race.select().where(pw.Race.name == row.race)
if not temp_query.exists():
temp_query = pw.Race(name=row.race)
temp_query.save()
else:
pass
temp_res = pw.Result(name=row.name,\
year=sheet.year,\
position=row.result,\
points_pcs=row.points_pcs)
if row.row_type in ["stage", "classification"]:
temp_res.name = row.race + ' ' + row.name
temp_res.race=temp_query
temp_res.rider=rider_obj
temp_res.save()
temp_query = None
temp_res = None
class Rider_bind:
def __init__(self, rider_id):
self.rider_py = pylib.Rider(rider_id)
self.rider_pw = pw.Rider(pcsid=self.rider_py.url_id, name=self.rider_py.name)
self.rider_pw.save()
def load_sheets(self, start_year, end_year):
for year in xrange(start_year, end_year + 1):
if year not in self.rider_py.sheets:
self.rider_py.load_sheets(year, year)
loaded_sheet = Sheet_bind(self.rider_pw, self.rider_py.sheets[year])
loaded_sheet.save()
def main():
pw.results_database.connect()
main()
在将最终文件加载到解释器中之后,我尝试将示例附加程序加载到数据库中。初始化Rider_bind
类工作正常,我仔细检查了一下,确保已将行实际写入了我在MySQL中的rider
表中。但是,当我尝试使用Rider_bind.load_sheets()
将结果加载到数据库中时,出现以下错误:
$ python
Python 2.7.15rc1 (default, Nov 12 2018, 14:31:15)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from scrape_to_peewee import *
>>> olly = Rider_bind("oliver-naesen")
>>> olly.load_sheets(2018, 2018)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "scrape_to_peewee.py", line 55, in load_sheets
loaded_sheet = Sheet_bind(self.rider_pw, self.rider_py.sheets[year])
File "scrape_to_peewee.py", line 33, in __init__
temp_res.race=temp_query
File "/home/trenza/.local/lib/python2.7/site-packages/peewee.py", line 3848, in __set__
if obj != fk_value and self.name in instance.__rel__:
File "/home/trenza/.local/lib/python2.7/site-packages/peewee.py", line 726, in __ne__
return not (self == other)
File "/home/trenza/.local/lib/python2.7/site-packages/peewee.py", line 723, in __eq__
return self._hash == other._hash
AttributeError: 'NoneType' object has no attribute '_hash'
该问题似乎与将其中一个peewee模型分配给外键字段有关。当我颠倒调用顺序以使temp_res.rider = rider_obj
优先出现时,它给了我同样的错误,并且回溯指向了该调用。
从peewee文档看来,ForeignKey字段应该像将另一个peewee类作为值分配给它们一样简单。有人知道我在这里错吗?任何帮助将非常感激。
谢谢!
编辑:
不是this question的重复项,因为据我所知,它与select
调用(上述问题中的问题)的返回值无关。
答案 0 :(得分:1)
分配给属性时,您需要将“ temp_query”解析为一个对象。
if not temp_query.exists():
temp_query = pw.Race(name=row.race)
temp_query.save()
else:
temp_query = temp_query.get() # fixed.