Question

我一直在研究一种从房地产API获取信息的刮刀。这是一个具有关键功能（重命名）的类：

myclass.execute_search
myclass.clean_data
myclass.upload_to_database
myclass.wrapper

myclass.wrapper在构造函数中执行，并按顺序执行execute_search，clean_data和upload_to_database。 execute_search利用请求从API发送和检索数据。还有一些成员变量被初始化，然后用于执行搜索。

当我以这种形式将该类作为单个实例执行时，所有这一切都非常有效：

myclass(postcode)

但是，如果我将代码放在循环中，那么前一次搜索的数据会泄漏到下一个搜索中。因此，通过循环的每次后续执行，上载到数据库的数据包括来自循环的所有先前执行的数据。当我遍历所有500个邮政编码时，最后一个邮政编码搜索将超过10000个结果转储到服务器上，即使实际上只有大约40个结果。

for postcode in postcodes:
    print(postcode)
    mydata = myclass(postcode)
    gc.collect()
    del mydata
    time.sleep(6)

我试图通过添加gc.collect（）和del mydata语句来删除此泄漏，但这并未解决泄漏问题。我已经仔细检查过文件中根本没有全局变量。我还明确定义了一个析构函数，但没有用。

def __del__(self):
    self.data = []

我对python（来自c ++）相对较新，我不确定我是否对使用类做错了。我可以通过在检索到的记录键上添加DISTINCT来在SQL数据库中使用解决方法，但这显然不太理想。

不幸的是，由于API是私有的，我无法在此论坛上粘贴整个课程。下面是构造函数：

def __init__(self, postcode, bedrooms_min=None, bedrooms_max=None, carspaces_min=None, carspaces_max=None,
             price_min=None, price_max=None, search_mode='buy', recursive=True, logging=True):
    self.q_postcode = postcode
    self.q_bedrooms_max = bedrooms_max
    self.q_bedrooms_min = bedrooms_min
    self.q_carspaces_min = carspaces_min
    self.q_carspaces_max = carspaces_max
    self.q_price_min = price_min
    self.q_price_max = price_max
    self.q_search_mode = search_mode
    self.q_recursive = recursive
    self.logging = logging
    self.requesttime = datetime.datetime.now()
    # ensure id is unique to database
    self.check_uniqid()
    # execute function after initialising variables
    try:
        self.execute_search()
    except:
        exc_type, exc_value, exc_traceback = sys.exc_info()
        lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
        self.error = ''.join(line for line in lines)
    if self.logging:
        try:
            self.cleandata()
            self.upload_toserver()
        except:
            exc_type, exc_value, exc_traceback = sys.exc_info()
            lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
            self.error = ''.join(line for line in lines)
        self.update_main_log()

类中的Python内存泄漏？

0 个答案: