[更新]代码和一些句子被更改以反映我在第二条评论中解释的实现。代码应该用下面的代码编译,但是,我有一个较旧的gfortran,可能没有看到你可能会遇到的一些错误。
gfortran BLU_implementation_copy.f90 -o BLU_implementation_copy.x
我在运行时使用Fortran 90获得了令人难以置信的不一致的seg错误。
我的代码的总体目标是将一个随机生成的,复杂的,对称的,对角占优的矩阵分解为可以轻松并行化的块。最后可以找到精简版(相关功能除外)。
我在函数mat2tiles中将原始矩阵分解为块(tile)时遇到了错误。特别是这行代码仍然失败:
o(i,j)=cell(x(a:min(i*Nblock,M), b:min(j*Nblock, N)))
这是将输入矩阵X的块分配给索引i,j处的矩阵o。它实现了这一点,因为o的每个索引都是通过派生类型的矩阵NblockxNblock:
type cell
complex*16 :: block(Nblock,Nblock)
end type cell
type(cell) :: o(M/Nblock+rem,N/Nblock+rem)
对于某些矩阵尺寸和瓷砖尺寸,一切都很完美。对于较大的尺寸,误差开始出现在接近矩阵尺寸的瓷砖尺寸中,并最终达到矩阵尺寸的一半。例如,我发现的这种情况的最小实例是一个20x20矩阵,其瓦片大小为18.但这并不是一致的。如果我尝试一下所有正常运行并且构建到那个大小的较小矩阵,它就会运行。如果我做了一个更大的尺寸来判断故障,然后用18运行20x20,那就会出现故障。我发现的最小的一致参数是25x25,图块大小为23.但即使有这个大小,如果我在o(i,j)赋值行之后打印出它失败的块:
print*, o(2,1)%block
它突然贯穿整个事情而没有问题。取出print语句会再次出现seg错误。 最后,最烦人的一个[修复 - 参考我的第二个评论],如果我做一个大小为1000或更大的2000x2000矩阵,我在进入函数后得到一个seg错误(这意味着它发生在变量分配)。这让我相信这个问题可能源于矩阵的分配方式,特别是因为我使用的是派生类型。但是当我尝试诊断矩阵的大小和内容时,一切看起来都很正常。
program BLU_implementation
implicit none
integer :: rem
integer, parameter :: M=20, N=20, Nblock=19
real*8 :: start, finish
complex*16 :: A(M,N)
type cell
complex*16 :: block(Nblock,Nblock)
end type cell
!determines if cell matrix doesn't need to be bigger
rem=1
if (modulo(M,Nblock)==0) then
rem=0
endif
call cpu_time(start)
call functionCalling(A, M, N, Nblock, rem)
call cpu_time(finish)
print*, 'overall time:', finish-start, 'seconds'
contains
!==================================================================================================================
subroutine functionCalling(A, M, N, Nblock, rem)
implicit none
integer :: IPIV, INFO, M, N, Nblock, rem
real*8 :: start, finish
complex*16 :: A(M,N)
type(cell) :: C(M/Nblock+rem,N/Nblock+rem)
call cpu_time(start)
A= CSPDmatrixFill(A,M,N)
call cpu_time(finish)
print*, 'matrix fill time:', finish-start, 'seconds'
call cpu_time(start)
C= mat2tiles(A,M,N,Nblock,rem)
call cpu_time(finish)
print*, 'tiling time:', finish-start, 'seconds'
end subroutine
!===================================================================================================================
! generates a complex, symmetric, positive-definite matrix (based off of another's code)
function CSPDmatrixFill(A, M, N) result (Matrix)
! initialization
implicit none
integer :: i, j
integer :: M, N
real*8 :: x, xi
complex*16 :: A(M, N), Matrix(M, N), EYE(M, N), MT(N, M)
EYE=0
forall(j=1:M) EYE(j,j)=1
! execution
call random_seed
do i=1, M
do j=1, N
call random_number (x)
call random_number(xi)
Matrix(i,j) = cmplx(x,xi)
end do
end do
! construct a symmetric matrix ( O(n^2) )
call Mtranspose(Matrix, M, N)
Matrix = Matrix+MT
! make positive definite (diagonally dominant)
Matrix = Matrix + N*EYE
end function CSPDmatrixFill
!======================================================================================================
subroutine Mtranspose(A, i, j)
! takes a matrix and the two parameters used to make the matrix: A(i,j)
! returns a matrix with switched indices: A(j,i)
implicit none
integer :: i, j
complex*16 :: A(i,j), MT(j,i)
MT=A(j,i)
return
end subroutine Mtranspose
!=======================================================================================================
!MAT2TILES - breaks up an array into a cell array of adjacent sub-arrays of equal sizes
!
! O=mat2tiles(X,M,N,Nblock)
!
!will produce a cell array o containing adjacent chunks of the array X(M,N)
!with each chunk of dimensions NblockxNblock. If Nblock does
!not divide evenly into size(X,i), then the chunks at the upper boundary of
!X along dimension i will have bogus values that do not affect the factorization
!in the places where the matrix doesn't occupy. (according to older versions. Might have changed with some edits)
!
function mat2tiles(X,M,N,Nblock,rem) result(o)
! initialization
implicit none
integer :: a, b, i, j, M, N, Nblock, rem
complex*16 :: X(M,N)
type(cell) :: o(M/Nblock+rem,N/Nblock+rem)
! diagnostic print statements
print*,size(o(1,1)%block), size(o(1,2)%block), size(o(2,1)%block), size(o(2,2)%block)
print*, 'got to start'
! turn matrix x into cell matrix o
do j=1, N/Nblock+rem
if (j==1) then
b=j
else
b=b+Nblock
endif
do i=1, M/Nblock+rem
if (i==1) then
a=i
else
a=a+Nblock
endif
! diagnostic print statement
print*, 'writing to o: i:', i, 'j:', j, 'i*Nblock:', i*Nblock, 'j*Nblock:', j*Nblock, 'min of i:', min(i*Nblock, M), &
'min of j:', min(j*Nblock, N), 'a:', a, 'b:', b
o(i,j)=cell(x(a:min(i*Nblock,M), b:min(j*Nblock, N)))
enddo
enddo
! diagnostic print statement
print*, 'got to end'
return
end function mat2tiles
!==================================================================================================
end program
答案 0 :(得分:0)
在将问题缩小到使用数字与相同数字的变量之后,发现Fortran不喜欢将矩阵分配给不同维度的矩阵,即使它适合于内部。这很奇怪,因为M
,N
和Nblock
的较小值很好地解决了这个问题。
解决方案只是为o(i,j)%block(1:Nblock,1:Nblock)=x(dim1:dim2,etc)
和o(i,j)=cell(x(dim1:dim2,etc)
定义i=1
而不是j=1
,并为i
和j
的所有情况略微修改每个案例的实例if (i==M/Nblock+rem .AND. j==N/Nblock+rem .AND. rem==1) then
o(i,j)%block(1:M-(i-1)*Nblock,1:N-(j-1)*Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else if (i==M/Nblock+rem .AND. j/=N/Nblock+rem .AND. rem==1) then
o(i,j)%block(1:M-(i-1)*Nblock,1:Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else if (i/=M/Nblock+rem .AND. j==N/Nblock+rem .AND. rem==1) then
o(i,j)%block(1:Nblock,1:N-(j-1)*Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else
o(i,j)%block(1:Nblock,1:Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
end if
。
正确的代码(适用于矩阵大小和磁贴大小的所有情况)如下所示:
{{1}}
通过此更正,我的代码可以在旧版本和新版本的gfortran上运行。然而,有趣的是,Mac OS X版本的gfortran从未遇到过这个版本,只有Linux版本。