我正在尝试实现一种处理时态数据的算法。
def calculate_frequency(T, Wth):
k = len(T)
df = pd.concat(T).sort_values('time')
frequency = 0
# some operations commented out that calculate the frequency from T and Wth
del df
gc.collect()
return frequency
T
是时间序列词典。时间序列表示为具有列'alert'
和'time'
的pandas DataFrame。
我为不同的T
重复调用此函数。令我惊讶的是,所有程序占用的内存越来越多。知道如何应对吗?
我到目前为止的尝试:删除df
并调用垃圾收集器。
操纵本身已被注释掉。它们不会影响记忆。
import os
import psutil
import numpy as np
import pandas as pd
import gc
process = psutil.Process(os.getpid())
n_data = 10000
n_alert = 40
alert = np.random.randint(0,n_alert,size=n_data).tolist()
time = np.random.rand(n_data).tolist()
df = pd.DataFrame(dict(alert=alert,time=time))
t_all = {}
for a in range(n_alert):
t_all[a] = df[df['alert']==a]
def calculate_frequency(T):
df = pd.concat(T).sort_values('time')
frequency = 0
# some operations commented out that calculate the frequency from T and Wth
del df
gc.collect()
return frequency
for a in range(n_alert):
for a2 in range(a):
T = [t_all[a],t_all[a2]]
calculate_frequency(T)
print(process.memory_info().rss)