Question

我有一个4维数组，这是一个三维数组的时间序列。我想沿着时间轴改变三维阵列中的每个点。以下是我使用嵌套for循环编写的代码。这可以通过花哨的numpy索引来完成吗？速度是一个因素。谢谢。

import numpy as np

timepoints = 2
x = 4
y = 4
z = 3

vol_1 = np.zeros((x, y, z))
vol_2 = np.ones((x, y, z))
timeseries = np.array((vol_1, vol_2))

timeseries.shape  # (2, 4, 4, 3)

# One voxel over time.
timeseries[:, 0, 0, 0]

for xx in range(x):
    for yy in range(y):
        for zz in range(z):
            np.random.shuffle(timeseries[:, xx, yy, zz])

Answer 1

我们可以沿第一个轴生成所有混洗索引，然后只需使用advanced-indexing来获取随机版本。现在，为了得到所有混洗索引，我们可以生成一个与输入数组相同形状的随机数组，并沿第一个轴获得argsort索引。之前已经探讨过here。

因此，我们将有一个像这样的矢量化实现 -

m,n,r,p = a.shape # a is the input array
idx = np.random.rand(*a.shape).argsort(0)
out = a[idx, np.arange(n)[:,None,None], np.arange(r)[:,None], np.arange(p)]

只是向读者解释究竟是什么问题，这是一个样本运行 -

1）输入4D数组：

In [711]: a
Out[711]: 
array([[[[60, 22, 34],
         [29, 18, 79]],

        [[11, 69, 41],
         [75, 30, 30]]],


       [[[63, 61, 42],
         [70, 56, 57]],

        [[70, 98, 71],
         [29, 93, 96]]]])

2）使用所提出的沿第一轴索引的方法生成的随机索引：

In [712]: idx
Out[712]: 
array([[[[1, 0, 1],
         [0, 1, 1]],

        [[0, 0, 1],
         [1, 0, 1]]],


       [[[0, 1, 0],
         [1, 0, 0]],

        [[1, 1, 0],
         [0, 1, 0]]]])

3）最后索引到混洗输出的输入数组：

In [713]: out
Out[713]: 
array([[[[63, 22, 42],
         [29, 56, 57]],

        [[11, 69, 71],
         [29, 30, 96]]],


       [[[60, 61, 34],
         [70, 18, 79]],

        [[70, 98, 41],
         [75, 93, 30]]]])

仔细观察，我们会在63和a[0,0,0,0]看到60 由于a[1,0,0,0]值idx分别为1和0 idx，22处的61处于idx的相应位置。接下来，0和1会留在他们的位置，因为In [726]: timeseries = np.random.rand(10,10,10,10) In [727]: %timeit org_app(timeseries) 100 loops, best of 3: 5.24 ms per loop In [728]: %timeit proposed_app(timeseries) 1000 loops, best of 3: 289 µs per loop In [729]: timeseries = np.random.rand(50,50,50,50) In [730]: %timeit org_app(timeseries) 1 loop, best of 3: 720 ms per loop In [731]: %timeit proposed_app(timeseries) 1 loop, best of 3: 426 ms per loop值为SELECT PolicyNum ,Plan_Code FROM [dbo].[IMS_IFCRP01_AA] WHERE Plan_Code = '1'和function onEdit(e) { var excluded=['cheese','ham','toast','butter','popcicle','cake','meat'] if(excluded.indexOf(e.source.getActiveSheet().getName())>0) { e.source.getActiveSheet().getRange('B5').setValue(new Date()); } }，依此类推。

运行时测试

var google = require('googleapis');
var Lien = require("lien");
var OAuth2 = google.auth.OAuth2;

var server = new Lien({
    host: "localhost"
  , port: 5000
});

var oauth2Client = new OAuth2(
  'YOUR_CLIENT_ID',
  'YOUR_CLIENT_SECRET',
  'http://localhost:5000/oauthcallback'
);

var scopes = [
  'https://www.googleapis.com/auth/youtube'
];

var youtube = google.youtube({
  version: 'v3',
  auth: oauth2Client
});

server.addPage("/", lien => {
    var url = oauth2Client.generateAuthUrl({
        access_type: "offline",
        scope: scopes
    });
    lien.end("<a href='"+url+"'>Authenticate yourself</a>");
})

server.addPage("/oauthcallback", lien => {
    console.log("Code obtained: " + lien.query.code);
    oauth2Client.getToken(lien.query.code, (err, tokens) => {
        if(err){
            return console.log(err);
        }

        oauth2Client.setCredentials(tokens);
        youtube.playlists.insert({
            part: 'id,snippet',
            resource: {
                snippet: {
                    title:"Test",
                    description:"Description",
                }
            }
        }, function (err, data, response) {
            if (err) {
                lien.end('Error: ' + err);
            }
            else if (data) {
                lien.end(data);
            }
            if (response) {
                console.log('Status code: ' + response.statusCode);
            }
        });
    });
});

在大尺寸下，创建随机数组的成本被证明是所提议方法的瓶颈，但仍然显示出比原始loopy版本更好的加速。

Answer 2

我添加这个作为答案，因为它不适合评论，只是在@Divakar的优秀答案之上只是一个小小的补充：

def divakar(a):
    m,n,r,p = a.shape # a is the input array
    idx = np.random.rand(*a.shape).argsort(0)
    return a[idx, np.arange(n)[:,None,None], np.arange(r)[:,None], np.arange(p)]

a = np.random.rand(50,50,50,50)
%timeit divakar(a)
560 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我通过多次使用重塑而不是广播来观察一些加速，例如：

def norok2(a):
    shape = a.shape
    idx = np.random.rand(*a.shape).argsort(0).reshape(shape[0], -1)
    return a.reshape(shape[0], -1)[idx, np.arange(shape[1] * shape[2] * shape[3])].reshape(shape)

a = np.random.rand(50,50,50,50)
%timeit norok2(a)
495 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

与OP的提议相比：

def jakub(a):
    t, x, y, z = a.shape
    for xx in range(x):
        for yy in range(y):
            for zz in range(z):
                np.random.shuffle(a[:, xx, yy, zz])


%timeit jakub(a)
2 s ± 30.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

顺便提一下，我提出的修改更容易扩展到n维数组和任意改组轴，例如：

import numpy as np
import functools

def shuffle_axis(arr, axis=0):
    arr = np.swapaxes(arr, 0, axis)
    shape = arr.shape
    i = np.random.rand(*shape).argsort(0).reshape(shape[0], -1)
    return arr.reshape(shape[0], -1)[i, np.arange(functools.reduce(lambda x, y: x * y, shape[1:]))].reshape(shape).swapaxes(axis, 0)

具有相似的速度：

a = np.random.rand(50,50,50,50)
%timeit shuffle_axis(a)
499 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

编辑重新访问

......而且时间并不比随机化所有内容更糟糕：

a = np.random.rand(50,50,50,50)
%timeit np.random.shuffle(a.ravel())
310 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

这应该是对这个问题的任何解决方案的性能的某种下限（但它不解决OP问题）。

通过体素随机播放4维时间序列

2 个答案: