Question

我正在尝试在内存中对齐以下类型的数据：

type foo
   real, allocatable, dimension(:) :: bar1, bar2
   !dir$ attributes align:64 :: bar1
   !dir$ attributes align:64 :: bar2
end type foo

type(foo), allocatable, dimension(:) :: my_foo
allocate(my_foo(1))
allocate(my_foo(1)%bar1(100))
allocate(my_foo(1)%bar2(100))

! somewhere here I need to tell the compiler that data is aligned
!    for a simple array with name `bar` I would just do:
!dir$ assume_aligned bar1: 64
!dir$ assume_aligned bar2: 64
!    but what do I do for the data type I have, something like this?
!dir$ assume_aligned my_foo(1)%bar1: 64
!dir$ assume_aligned my_foo(1)%bar2: 64

do i = 1, 100
   my_foo(1)%bar1(i) = 10.
   my_foo(1)%bar2(i) = 10.
end do

如您所见，它是foo类型结构的数组，具有两个大数组bar1和bar2作为变量，需要在内存中的缓存边界附近对齐

我有点知道如何对简单数组（link）执行此操作，但是我不知道如何针对这种复杂的数据结构执行此操作。如果my_foo的大小不是1，而是大小为100，该怎么办？我会遍历他们吗？

Answer 1

好，案例半封闭。事实证明该解决方案非常简单。您只需使用指针并对它们执行assume_aligned。那应该照顾它。

type foo
   real, allocatable, dimension(:) :: bar1, bar2
   !dir$ attributes align:64 :: bar1
   !dir$ attributes align:64 :: bar2
end type foo

type(foo), target, allocatable, dimension(:) :: my_foo
real, pointer, contiguous :: pt_bar1(:)
real, pointer, contiguous :: pt_bar2(:)
allocate(my_foo(1))
allocate(my_foo(1)%bar1(100))
allocate(my_foo(1)%bar2(100))

pt_bar1 = my_foo(1)%bar1
pt_bar2 = my_foo(1)%bar2
!dir$ assume_aligned pt_bar1:64, pt_bar2:64

pt_bar1 = 10.
pt_bar2 = 10.

do循环仍未向量化smh。就像我做同样的事情一样

do i = 1, 100
   pt_bar1(i) = 10.
   pt_bar2(i) = 10.
end do

它不会被矢量化。

UPD。 好的，这可以完成工作（还需要向编译器添加-qopenmp-simd标志）：

!$omp simd
!dir$ vector aligned
do i = 1, 100
   pt_bar1(i) = 10.
   pt_bar2(i) = 10.
end do

此外，如果您要遍历my_foo(j)%...，请确保在每次迭代后使用pt_bar1 => null()等释放指针。

PS。感谢我们部门的BW提供的帮助。 :)有时，个人通信> stackoverflow（并非总是如此，只是有时）。

英特尔Fortran中结构内部的数据对齐

1 个答案: