Question

在英特尔Fortran等现代Fortran编译器中，是否有可能在运行时确定数组步幅？例如，我可能想在阵列部分上执行快速傅里叶变换（FFT）：

program main

    complex(8),allocatable::array(:,:)

    allocate(array(17, 17))
    array = 1.0d0

    call fft(array(1:16,1:16))

contains

    subroutine fft(a)  
        use mkl_dfti

        implicit none

        complex(8),intent(inout)::a(:,:)

        type(dfti_descriptor),pointer::desc
        integer::stat

        stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, shape(a) )
        stat = DftiCommitDescriptor(desc)
        stat = DftiComputeForward(desc, a(:,1))
        stat = DftiFreeDescriptor(desc)

    end subroutine  

end program

但是，MKL Dfti *例程需要明确告知数组步幅。通过参考手册，我没有发现任何返回步幅信息的内在函数。一些有趣的资源是here和here，它们讨论是否复制了数组部分以及英特尔Fortran如何在内部处理数组。我宁愿不限制英特尔目前使用其数组描述符的方式。

我怎样才能找出步幅信息？请注意，通常我希望fft例程（或任何类似例程）不需要传递有关要传入的数组的任何其他信息。

编辑：

我已经验证在这种情况下创建了一个临时的数组而不是，这里有一段更简单的代码，我在英特尔（R）Visual Fortran编译器XE 14.0.2.176 [英特尔（英特尔）上查了一下R）64]，禁用优化并将堆数组设置为0。

program main
    implicit none

    real(8),allocatable::a(:,:)

    pause

    allocate(a(8192,8192))

    pause

    call random_number(a)

    pause

    call foo(a(:4096,:4096))

    pause

    contains

    subroutine foo(a)
        implicit none

        real(8)::a(:,:)

        open(unit=16, file='a_sum.txt')

        write(16, *) sum(a)

        close(16)

    end subroutine

end program

监视内存使用情况，很明显永远不会创建数组临时。

编辑2：

module m_foo
    implicit none

contains

    subroutine foo(a)
        implicit none

        real(8),contiguous::a(:,:)

        integer::i, j

        open(unit=16, file='a_sum.txt')

        write(16, *) sum(a)

        close(16)        

        call nointerface(a)

    end subroutine

end module

subroutine nointerface(a)
    implicit none

    real(8)::a(*)

end subroutine

program main
    use m_foo

    implicit none

    integer,parameter::N = 8192
    real(8),allocatable::a(:,:)

    integer::i, j
    real(8)::count

    pause

    allocate(a(N, N))

    pause

    call random_number(a)

    pause

    call foo(a(:N/2,:N/2))

    pause

end program

编辑3：

这个例子说明了我想要实现的目标。有一个16x16的连续数组，但我只想转换上面的4x4数组。第一个调用只是传入数组部分，但它不会返回数组左上角的单个调用。第二个调用设置适当的步幅，a随后包含正确的高4x4数组。上部4x4阵列相对于完整的16x16阵列的步幅不是一个。

program main
    implicit none

    complex(8),allocatable::a(:,:)

    allocate(a(16,16))

    a = 0.0d0
    a(1:4,1:4) = 1.0d0

    call fft(a(1:4,1:4))

    write(*,*) a(1:4,1:4)

    pause

    a = 0.0d0
    a(1:4,1:4) = 1.0d0

    call fft_stride(a(1:4,1:4), 1, 16)

    write(*,*) a(1:4,1:4)

    pause

    contains

    subroutine fft(a)  !{{{
        use mkl_dfti

        implicit none

        complex(8),intent(inout)::a(:,:)

        type(dfti_descriptor),pointer::desc
        integer::stat

        stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, shape(a) )
        stat = DftiCommitDescriptor(desc)
        stat = DftiComputeForward(desc, a(:,1))
        stat = DftiFreeDescriptor(desc)

    end subroutine  !}}}

    subroutine fft_stride(a, s1, s2)  !{{{
        use mkl_dfti

        implicit none


        complex(8),intent(inout)::a(:,:)
        integer::s1, s2

        type(dfti_descriptor),pointer::desc
        integer::stat

        integer::strides(3)

        strides = [0, s1, s2]

        stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, shape(a) )
        stat = DftiSetValue(desc, DFTI_INPUT_STRIDES, strides)
        stat = DftiCommitDescriptor(desc)
        stat = DftiComputeForward(desc, a(:,1))
        stat = DftiFreeDescriptor(desc)

    end subroutine  !}}}  

end program

Answer 1

例程DftiComputeForward接受假定的大小数组。如果你传递复杂且不连续的东西，则必须在传递时复制。编译器可以在运行时检查副本是否真的有必要。在任何情况下，你的步幅总是1，因为这将是MKL例程将会看到的步幅。

在你传递A(:,something)的情况下，这是一个连续的部分，前提是A是连续的。如果A不连续，则必须制作副本。步伐总是1。

Answer 2

我猜你很困惑，因为你通过给它DftiComputeForward来解决MKL函数a(:,1)的显式接口。这是连续的，并不需要临时的数组。这是错误的，但是，低级例程将获得整个数组，这就是为什么你看到它在你指定步幅时有效的原因。由于DftiComputeForward表示数组complex(kind), intent inout :: a(*)，因此您可以将其传递给外部子例程。

program ...
call fft(4,4,a(1:4,1:4))
end program

subroutine fft(m,n,a)  !{{{
use mkl_dfti

implicit none

complex(8),intent(inout)::a(*)
integer :: m, n

type(dfti_descriptor),pointer::desc
integer::stat

stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, (/m,n/) )
stat = DftiCommitDescriptor(desc)
stat = DftiComputeForward(desc, a)
stat = DftiFreeDescriptor(desc)

end subroutine !}}}

这会在进入子程序时创建一个临时数组。一个更有效的解决方案确实是大踏步的：

program ...
call fft_strided(4,4,a,16)
end program

subroutine fft_strided(m,n,a,lda)  !{{{
use mkl_dfti

implicit none

complex(8),intent(inout)::a(*)
integer :: m, n, lda

type(dfti_descriptor),pointer::desc
integer::stat

integer::strides(3)

strides = [0, 1, lda]

stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, (/m,n/) )
stat = DftiSetValue(desc, DFTI_INPUT_STRIDES, strides)
stat = DftiCommitDescriptor(desc)
stat = DftiComputeForward(desc, a)
stat = DftiFreeDescriptor(desc)

end subroutine !}}}

Answer 3

这里的一些答案并不理解fortran步幅和记忆步幅之间的差异（尽管它们是相关的）。

除了你在这里遇到的具体情况之外，为未来的读者回答你的问题 - 似乎没有找到仅仅在fortran中的数组步幅，但它可以通过C使用新编译器中的互操作性功能来完成。

您可以在C：

中执行此操作

#include "stdio.h"
size_t c_compute_stride(int * x, int * y)
{
    size_t px = (size_t) x;
    size_t py = (size_t) y;
    size_t d = py-px;
    return d;
}

然后从fortran在数组的前两个元素上调用此函数，例如：

program main
    use iso_c_binding
    implicit none

    interface
        function c_compute_stride(x, y) bind(C, name="c_compute_stride")
            use iso_c_binding
            integer :: x, y
            integer(c_size_t) :: c_compute_stride
        end function
    end interface

    integer, dimension(10) :: a
    integer, dimension(10,10) :: b

    write(*,*) find_stride(a)
    write(*,*) find_stride(b(:,1))
    write(*,*) find_stride(b(1,:))

    contains

    function find_stride(x)
        integer, dimension(:) :: x
        integer(c_size_t) :: find_stride
        find_stride = c_compute_stride(x(1), x(2))
    end function

end program

这将打印出来：

                4
                4
               40

Answer 4

简而言之：假定形状的数组总是有步幅1。

稍长一点：当你将一个数组的一部分传递给一个带有假定形状数组的子程序时，就像你在这里一样，那么子程序对数组的原始大小一无所知。如果查看子例程中伪参数的上限和下限，您将看到它们将始终是数组部分的大小和1。

integer, dimension(10:20) :: array
integer :: i

array = [ (i, i=10,20) ]
call foo(array(10:20:2))

subroutine foo(a)
    integer, dimension(:) :: a
    integer :: i

    print*, lbound(a), ubound(a)
    do i=lbound(a,1), ubound(a,2)
        print*, a(i)
    end do

end subroutine foo

这给出了输出：

1 6
10 12 14 16 18 20

所以，即使你的数组索引从10开始，当你传递它（或它的一部分）时，子程序认为索引从1开始。同样，它认为步幅是1.你可以给出一个下限虚假的论点：

integer, dimension(10:) :: a

这将使lbound(a) 10和ubound(a) 15.但是不可能给假定形状的数组一个步幅。

在运行时确定假定形状的数组步幅

4 个答案: