我有多个图像文件夹,我将它们读入数据框,其中数据框行中的每个文件夹都与data相关联。整个文件夹的大小为350 MB,但是当我将其读入数据帧时,其总大小变为24GB,你知道为什么会发生这种情况吗?
videos = pd.DataFrame()
filepath= 'C:/Users/sarmad/Documents/data/labels_metadata.csv''
metadf = pd.read_csv(filepath)
metadf.index = metadf.Instance_name
for folder in folders:
pth_upd = pth + folder + '/'
metacsv=
' pd.read_csv('C:/Users/sarmad/Documents/dev/'+format(folder)+'.csv')
x=format(folder)
meta = metadf.loc[format(folder)]
meta = pd.DataFrame([meta.values], index=[folder], columns=metadf.columns)
df = pd.DataFrame(index=[folder])
df = df.join(meta)
allfiles = os.listdir(pth_upd)
files = []
columns = ['data']
for file in allfiles:
files.append(file) if ('.jpg' in file) else None
samples = np.empty((1,227,227))
for file in files:
img = cv2.imread(os.path.join(pth_upd,file),0)
img = img.reshape(1,227,227)
img=img.astype(np.float32)
samples = np.append(samples, img, axis=0)
result = pd.DataFrame(([[samples]]), index=[folder], columns=['videos'])
print(samples.shape)
videos=videos.append(result)
videos.info(memory_usage ='deep')
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, dev_001 to dev_060
Data columns (total 1 columns):
videos 60 non-null object
dtypes: object(1)
memory usage: 24GB
答案 0 :(得分:1)
如果在转换图像时使用np.int8
而不是np.float32
会有所帮助吗?由于RGB值在(0,255)范围内,可以表示为8位整数。
但是,从理论上讲,它只能将内存大小减少4倍,因此它仍然占用6GB的内存。