Question

我遇到numpy的问题：

我需要numpy才能让我的模块更有效率
但是，当文件太大（超过150 Mo，我有12个Go RAM ......）时，在targets.csv中加载文件ndarray会导致MemoryError < / LI>
有没有办法可以重写这个来处理一个大的targets.csv文件，同时仍然有numpy惊人的速度？

谢谢！

import csv
import numpy as np
import scipy.spatial
import cv2

"""loading files"""

points = np.array([(int(R), int(G), int(B), float(X), float(Y), float(Z))
              for R, G, B, X, Y, Z in csv.reader(open('colorlist.csv'))])
    # load X,Y,Z coordinates of 'points' in a np.array 
print "colorlist loaded"

targets = np.array([(float(X), float(Y), float(Z))
           for X, Y, Z in csv.reader(open('targets.csv'))])
    # load the XYZ target values in a np.array
print "targets loaded"

img = cv2.imread("MAP.tif", -1)
height, width = img.shape
total = height * width
# load dimensions of tif image
print "MAP loaded"


"""doing geometry"""

tri = scipy.spatial.Delaunay(points[:,[3,4,5]], furthest_site=False) # True makes an almost BW picture
# Delaunay triangulation

indices = tri.simplices
# indices of vertices

vertices = points[indices]
# the vertices for each tetrahedron

tet = tri.find_simplex(targets)
# find which tetrahedron each target belongs to

U = tri.transform[tet,:3]
V = targets - tri.transform[tet,3]  
b = np.einsum('ijk,ik->ij', U, V)
bcoords = np.c_[b, 1 - b.sum(axis=1)]
# find the barycentric coordinates of each point

Answer 1

你有没有尝试过记忆图？它可以通过numpy.memmap调用。这适用于太大而无法加载到ram中的文件。

以下是我从docstring中复制的说明：

为存储在a中的数组创建内存映射   磁盘上的二进制文件。

内存映射文件用于访问大段的小段   磁盘上的文件，无需将整个文件读入内存。 NumPy的的   memmap是类似于数组的对象。这与Python的mmap不同   模块，它使用类似文件的对象。

这个ndarray的子类与某些人有一些不愉快的互动   操作，因为它不太适合作为子类。一个   使用此子类的替代方法是创建mmap对象   你自己，然后用ndarray创建一个ndarray。 new 直接，   传递在'buffer ='参数中创建的对象。

这个类可能在某些时候变成了工厂功能   将视图返回到mmap缓冲区。

使用起来相当简单。您可以参考文档以获取更多示例。

Answer 2

我发现只有使用x64版本的Python 2.7，scipy，openCV和numpy，它才能很好地运行

Python ndarray形成大文件，内存错误

2 个答案: