考虑这个例子:
import numpy as np
a = np.array(1)
np.save("a.npy", a)
a = np.load("a.npy", mmap_mode='r')
print(type(a))
b = a + 2
print(type(b))
输出
<class 'numpy.core.memmap.memmap'>
<class 'numpy.int32'>
所以似乎b
不再是memmap
,我认为这迫使numpy
读取整个a.npy
,从而违背了memmap的目的。因此,我的问题是,memmaps
上的操作可以推迟到访问时间吗?
我认为子类化ndarray
或memmap
可以起作用,但对我的Python技能没有足够的信心去尝试它。
这是一个显示我的问题的扩展示例:
import numpy as np
# create 8 GB file
# np.save("memmap.npy", np.empty([1000000000]))
# I want to print the first value using f and memmaps
def f(value):
print(value[1])
# this is fast: f receives a memmap
a = np.load("memmap.npy", mmap_mode='r')
print("a = ")
f(a)
# this is slow: b has to be read completely; converted into an array
b = np.load("memmap.npy", mmap_mode='r')
print("b + 1 = ")
f(b + 1)
答案 0 :(得分:1)
这就是python的工作方式。默认情况下,numpy操作会返回一个新数组,因此server <- function(input, output) {
level<- 0
# plot all polygons of the first level, which is 0
output$map <- renderLeaflet({
leaflet(data = wijk_sf) %>%
#setView(lng = 4.473719, lat = 51.88956, zoom = 11) %>%
addProviderTiles("Stamen.Terrain") %>%
addPolygons(color = "black",
fillColor = "darkgreen",
fillOpacity = 0.7,
label = ~GEBDNAAM,
layerId = ~GEBDNAAM
)
})
observe(
{click = input$map_shape_click
p <- input$map_shape_click
if(is.null(click)){
return()
}else if((p$id %in% wijk_vec) & level == 0){
level<- 1
# plot polygon level 0 here and set level to 1
}else if((p$id %in% buurt_vec) & level == 1){
level <- 2
# if level == 1, plot polygon level 1 here and set level to 2
}else if((p$id %in% buurt_vec) & level == 2){
level<- 0
# if level == 2, plot polygon level 2 here and set level to 0
}else{
level <- 0
# if all else fails, set level to 0 and plot the standard level 0 map
leafletProxy('map') %>%
clearShapes() %>%
clearMarkers() %>%
setView(lng = 4.473719, lat = 51.88956, zoom = 11) %>%
addPolygons(data = df[,level])
}
)
}
不会作为内存映射存在-它是在b
上调用+
时创建的。
有两种方法可以解决此问题。最简单的是就地执行所有操作,
a
这需要加载用于读取和写入的内存映射数组,
a += 1
如果您不想覆盖原始数组,那么这当然没有好处。
在这种情况下,您需要指定a = np.load("a.npy", mmap_mode='r+')
应该被映射。
b
可以使用b = np.memmap("b.npy", mmap+mode='w+', dtype=a.dtype, shape=a.shape)
关键字provided by numpy ufuncs.
out
答案 1 :(得分:1)
这是ndarray
子类的简单示例,该子类推迟对其进行操作,直到通过索引请求特定元素为止。
我将其包括在内是为了表明它可以完成,但是几乎可以肯定,它将以新颖和出乎意料的方式失败,并且需要大量工作才能使其可用。
在非常特殊的情况下,它可能比重新设计代码以更好地解决问题要容易。
建议您阅读文档中的these examples,以帮助了解其工作原理。
import numpy as np
class Defered(np.ndarray):
"""
An array class that deferrs calculations applied to it, only
calculating them when an index is requested
"""
def __new__(cls, arr):
arr = np.asanyarray(arr).view(cls)
arr.toApply = []
return arr
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
## Convert all arguments to ndarray, otherwise arguments
# of type Defered will cause infinite recursion
# also store self as None, to be replaced later on
newinputs = []
for i in inputs:
if i is self:
newinputs.append(None)
elif isinstance(i, np.ndarray):
newinputs.append(i.view(np.ndarray))
else:
newinputs.append(i)
## Store function to apply and necessary arguments
self.toApply.append((ufunc, method, newinputs, kwargs))
return self
def __getitem__(self, idx):
## Get index and convert to regular array
sub = self.view(np.ndarray).__getitem__(idx)
## Apply stored actions
for ufunc, method, inputs, kwargs in self.toApply:
inputs = [i if i is not None else sub for i in inputs]
sub = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
return sub
如果不使用numpy的通用函数对其进行了修改,则此操作将失败。例如percentile
和median
不是基于ufuncs,最终将加载整个数组。同样,如果将其传递给在数组上迭代的函数,或者将索引应用于大量对象,则整个数组将被加载。