当我尝试一些抓取代码时遇到了这个问题。我定义了一个MongoCache类来缓存html页面:
class MongoCache:
def __init__(self, client=None, expires=timedelta(days=30)):
self.client = MongoClient('localhost', 27017) if client is None else client
self.db = self.client.cache
self.db.webpage.create_index('timestamp1', expireAfterSeconds=expires.total_seconds())
当我构建对象时:
cache = MongoCache()
出现故障信息。
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "F:\pythoncode\webscraping\mongo_cache.py", line 20, in __init__
File "D:\python27\lib\site-packages\pymongo\collection.py", line 1958, in create_index
self.__create_index(keys, kwargs, session, **cmd_options)
File "D:\python27\lib\site-packages\pymongo\collection.py", line 1860, in __create_index
session=session)
File "D:\python27\lib\site-packages\pymongo\collection.py", line 244, in _command
retryable_write=retryable_write)
File "D:\python27\lib\site-packages\pymongo\pool.py", line 579, in command
unacknowledged=unacknowledged)
File "D:\python27\lib\site-packages\pymongo\network.py", line 150, in command
parse_write_concern_error=parse_write_concern_error)
File "D:\python27\lib\site-packages\pymongo\helpers.py", line 155, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
OperationFailure: Index with name: timestamp_1 already exists with different options
我尝试了stackoverflow的一些解决方案,但这些解决方案不适用于pymongo,甚至无法使用drop_index()方法。 我在pycharm上使用了win10,python2.7,而MongoDB服务器版本为4.0.3。 我花了两天时间来解决问题,然后放弃了。
答案 0 :(得分:0)
现在,我再次尝试了该问题,发现问题可能出在用于索引的时间戳记中。 我定义了一个没有输入参数的对象,一切正常。
cache = MongoCache()
但是,使用时间戳记,它又出现了:
cache = MongoCache(expires=timedelta())
保存url值的功能是:
def __setitem__(self, url, result):
record = {
'result': Binary(zlib.compress(pickle.dumps(result))),
'timestamp': datetime.utcnow()}
self.db.webpage.update({'_id': url}, {'$set': record}, upsert=True)