如何加快在块中创建常规网格和写入文件的速度很慢

时间:2018-06-08 01:10:46

标签: python numpy interpolation python-multiprocessing gdal

我正在尝试将散点插入到常规网格中。 对于小域,即少量行和列np.meshgrid工作正常。 如果行和列很大,则会抛出MemoryError。 所以我尝试用较小的块处理整个域并应用插值函数并使用gdal将其写入geotiff文件。
下面是代码,我在评论中给出了解释。

import numpy as np
from osgeo import gdal
import csv
import scipy.spatial as spatial

## loading lat, lon and values from csv file
lat = []
lon = []
ele = []

## The csv file contains the lat/lon/ele 
with open('data.csv', 'r') as data:
    for row in data:
        row = row.strip().split()
        lat.append(float(row[2]))
        lon.append(float(row[1]))
        ele.append(float(row[3]))
## creating a numpy array to feed into KDTree for spatial indexing      
xycoord = np.c_[lon,lat]
ele_arr = np.array(ele)

## Generating KDTree for nearest neighour search
point_tree = spatial.cKDTree(xycoord, leafsize=15)

## Getting domain extents
## 23.204 , 146.447, -61.509, 25.073
xmin, xmax, ymin, ymax = min(lon),max(lon), min(lat), max(lat)  

res = 0.003 ## Grid spacing ~~330 meters 

x = np.arange(xmin, xmax, res, dtype=np.float16)
y = np.arange(ymin, ymax, res, dtype=np.float16)

nx = x.shape[0]
ny = y.shape[0]
print (nx, ny) # ~ (41081 28861)

## Creating of geotiff file using gdal
outFile = "test.tif"
format_ = "GTiff"
driver = gdal.GetDriverByName(format_)
outRas = driver.Create(outFile, nx, ny, 1, gdal.GDT_Float32, options=['COMPRESS=DEFLATE'])
outRas.SetGeoTransform((xmin, res, 0, ymax, 0, -res))

## No of rows and columns in each chunk
step = 2000 

## starting and ending indices for row and column for each chunk
xstart = []
xend = []
ystart = []
yend = []

for i in range(0,nx,step):
    for j in range(0,ny,step):
        xstart.append(i)
        xend.append(i+step)
        ystart.append(j)
        yend.append(j+step)

## Actual loop      
for i in range(len(xstart)):
    t = np.meshgrid(x[xstart[i]:xend[i]],y[ystart[i]:yend[i]]) ## Creating a meshgrid 
    ## Actual intended flow
    ## xy = np.stack(np.meshgrid(x[xstart[i]:xend[i]],y[ystart[i]:yend[i]]), axis = -1)
    ## distances, points_idx = point_tree.query(xy, k=3, eps=0)
    ## z = interpFn(distances, ele_arr[points_idx])

    ##To test the speed, not using above code and using a simple fn which 
    ##takes our input matrix and return matrix with same dimensions. even np.ones() will do

    z = fn(t) ## this could be any function
    outRas.GetRasterBand(1).WriteArray(z,xstart[i],ystart[i]) ## Writing to geotiff file 

outRas = None

现在解析了MemoryError,但是对于大尺寸矩阵来说,它很慢并且在应用任何函数之前编写一个简单矩阵需要花费大量时间。请提供加快过程的建议

1 个答案:

答案 0 :(得分:0)

关于您的初始问题(取决于您的数据内容),将sparce参数设置为True可能会为您节省大量内存:

np.meshgrid(data, data, sparse=True)