Question

Metal Performance Shader框架为构建您自己的卷积神经网络提供支持。在创建例如MSPCNNConvolution时，它需要4D权重张量作为init参数，表示为1D浮点指针。

init(device: MTLDevice,
  convolutionDescriptor: MPSCNNConvolutionDescriptor,
  kernelWeights: UnsafePointer<Float>,
  biasTerms: UnsafePointer<Float>?,
  flags: MPSCNNConvolutionFlags)

文档对4D张量

有这个说法

过滤器重量的布局可以安排重新解释为4D张量（数组）重量[outputChannels] [kernelHeight] [kernelWidth] [inputChannels /组]

不幸的是，这些信息并没有真正告诉我如何将4D数组排列成一维Float指针。

我尝试按照BNNS对手的要求对其进行排序，但没有运气。

如何将4D张量（数组）正确表示为1D Float指针（数组）？

PS：我尝试将它像C阵列一样排列并获得指向扁平阵列的指针，但它没有用。

更新

@RhythmicFistman：那是我如何将它存储在一个普通数组中，我可以将其转换为UsafePointer<Float>（但不起作用）：

var output = Array<Float>(repeating: 0, count: weights.count)

for o in 0..<outputChannels {
    for ky in 0..<kernelHeight {
        for kx in 0..<kernelWidth {
            for i in 0..<inputChannels {
                let offset = ((o * kernelHeight + ky) * kernelWidth + kx) * inputChannels + i
                output[offset] = ...
            }
        }
    }
}

Answer 1

好的，我明白了。以下是我用来改造卷积和完全连接矩阵的2个python函数

# shape required for MPSCNN [oC kH kW iC]
# tensorflow order is [kH kW iC oC]
def convshape(a):
    a = np.swapaxes(a, 2, 3)
    a = np.swapaxes(a, 1, 2)
    a = np.swapaxes(a, 0, 1)
    return a

# fully connected only requires a x/y swap
def fullshape(a):
    a = np.swapaxes(a, 0, 1)
    return a

Answer 2

这是我最近必须为Caffe权重做的事情，所以我可以提供Swift实现，以便重新排序。以下函数接受卷积的Caffe权重的浮点数组（在[c_o] [c_i] [h] [w]顺序中）并重新排序那些金属期望的那些（[c_o] [h] [w] [c_i]顺序）：

public func convertCaffeWeightsToMPS(_ weights:[Float], kernelSize:(width:Int, height:Int), inputChannels:Int, outputChannels:Int, groups:Int) -> [Float] {

    var weightArray:[Float] = Array(repeating:0.0, count:weights.count)
    var outputIndex = 0

    let groupedInputChannels = inputChannels / groups
    let outputChannelWidth = groupedInputChannels * kernelSize.width * kernelSize.height

    // MPS ordering: [c_o][h][w][c_i]
    for outputChannel in 0..<outputChannels {
        for heightInKernel in 0..<kernelSize.height {
            for widthInKernel in 0..<kernelSize.width {
                for inputChannel in 0..<groupedInputChannels {
                    // Caffe ordering: [c_o][c_i][h][w]
                    let calculatedIndex = outputChannel * outputChannelWidth + inputChannel * kernelSize.width * kernelSize.height + heightInKernel * kernelSize.width + widthInKernel
                    weightArray[outputIndex] = weights[calculatedIndex]
                    outputIndex += 1
                }
            }
        }
    }

    return weightArray
}

基于我的图层可视化，这似乎生成了正确的卷积结果（匹配Caffe生成的那些）。我认为它也适当考虑了分组，但我需要验证。

Tensorflow的排序顺序与Caffe不同，但你应该能够改变循环内部的数学来解释它。

Answer 3

此处的文档假定了C中的一些专业知识。在该上下文中，当x，y和z是编译时已知的常量时，[x] [y] [z]通常会折叠为1-d数组。发生这种情况时，z分量变化最快，接着是y，接着是x - 在...之外。

如果我们有[2] [2] [2]，它会折叠为1D：

{ a[0][0][0], a[0][0][1], a[0][1][0], a[0][1][1], 
  a[1][0][0], a[1][0][1], a[1][1][0], a[1][1][1] }

Answer 4

我认为tensorflow已经有了一个方便的方法来完成这样的任务：

tf.transpose(aWeightTensor, perm=[3, 0, 1, 2])

完整文档：https://www.tensorflow.org/api_docs/python/tf/transpose

MPSCNN重量订购

4 个答案: