Question

我有一个ASCII文件，看起来像：

____________________________________________
Header1 ...
Header2 ...
Header3 ...
block(1)data1 block(1)data2 block(1)data3
block(1)data4 block(1)data5 block(1)data6
block(2)data1 block(2)data2 block(2)data3
block(2)data4 block(2)data5 block(2)data6
...
block(n)data1 block(n)data2 block(n)data3
block(n)data4 block(n)data5 block(n)data6
____________________________________________

我想将其转换为如下所示的ASCII文件：

____________________________________________
HeaderA ...
HeaderB ...
block(n)data1 block(n)data2 block(n)data3
block(n)data4 block(n)data5 block(n)data6
block(n-1)data1 block(n-1)data2 block(n-1)data3
block(n-1)data4 block(n-1)data5 block(n-1)data6
....
block(1)data1 block(1)data2 block(1)data3
block(1)data4 block(1)data5 block(1)data6
____________________________________________

数据主要是实数，并且数据集的大小太大，无法使用可分配的数组。因此，我可以通过某种方式即时读写。

我找不到在文件中向后读或写的方法。

Answer 1

我不会直接使用Fortran，而是使用一系列Linux命令（或Windows上的Cygwin / GNU utils）。 Fortran也可以（请参见第二种可能性）。

概述（基于OS命令）：

获取总行数（例如，wc）
从文件中取出前3行（例如，使用head）到文件result file
加工主体
- 最后三行（例如tail）
- 通过连接相关行的awk脚本运行结果
- 对结果运行tac
- 运行另一个awk脚本以分割行
- 将结果追加到result file

另一个想法是（用编程语言）：

使用每个块的开始文件位置创建数组（ftell的结果）。
将标题移到新文件
从头到尾遍历上面创建的数组
- 将fseek移至指定位置
- 读取相关行数并再次写出

Answer 2

使用可分配数组的大方法。

如果数据适合内存，则可以这样做。我已经测试过了，一个文件

header(1)
header(2)
header(3)
block(1).data1 block(1).data2 block(1).data3
block(1).data4 block(1).data5 block(1).data6
block(2).data1 block(2).data2 block(2).data3
block(2).data4 block(2).data5 block(2).data6
...
block(9999998).data1 block(9999998).data2 block(9999998).data3
block(9999998).data4 block(9999998).data5 block(9999998).data6
block(9999999).data1 block(9999999).data2 block(9999999).data3
block(9999999).data4 block(9999999).data5 block(9999999).data6

文件大小为1.2GB的这个awk脚本可以将其反转：

#!/usr/bin/awk
# if line contains word "header", print immediately, move on to next line.
/header/ {print; next}

# move every line to memory.
  {
    line[n++] = $0
  }

# When finished, print them out in order n-1, n, n-3, n-2, n-5, n-4, ...
END {
  for (i=n-2; i>=0; i-=2) {
    print(line[i])
    print(line[i+1])
  }
}

在2分钟之内。

如果这实际上是不可能的，则需要执行@ high-performance-mark所说的操作，并以可管理的块形式读取它，在内存中将其反转，然后最后将它们连接在一起。这是我的版本：

program reverse_order
  use iso_fortran_env, only: IOSTAT_END
  implicit none
  integer, parameter :: max_blocks_in_memory = 10000
  integer, parameter :: max_line_length=100
  character(len=max_line_length) :: line
  character(len=max_line_length) :: data(2, max_blocks_in_memory)

  character(len=*), parameter :: INFILE='data.txt'
  character(len=*), parameter :: OUTFILE='reversed_data.txt'
  character(len=*), parameter :: TMP_FILE_FORMAT='("/tmp/", I10.10,".txt")'
  character(len=len("/tmp/XXXXXXXXXX.txt")) :: tmp_file_name

  integer :: in_unit, out_unit, tmp_unit
  integer :: num_headers, i, j, tmp_file_number

  integer :: ios

! Open the input and output files
  open(newunit=in_unit, file=INFILE, action="READ", status='OLD')
  open(newunit=out_unit, file=OUTFILE, action='WRITE', status='REPLACE')

! Transfer the headers to the output file immediately.
  num_headers = 0
  do
    read(in_unit, '(A)') line
    if (index(line, 'header') == 0) exit
    num_headers = num_headers + 1
    write(out_unit, '(A)') trim(line)
  end do

! We've already read the first data line, so let's rewind and start anew.
  rewind(in_unit)
! move past the headers.
  do i = 1, num_headers
    read(in_unit, *)
  end do


  tmp_file_number = 0

! Read the data from the input line max_blocks_in_memory blocks at a time.
  read_loop : do
    do i = 1, max_blocks_in_memory
      read(in_unit, '(A)', iostat=ios) data(1, i)
      if (ios == IOSTAT_END) then   ! Reached the end of the input file.
        if (i > 1) then         ! Still have final values in memory, write them
                                ! to output immediately.
          do j = i-1, 1, -1
            write(out_unit, '(A)') trim(data(1, j))
            write(out_unit, '(A)') trim(data(2, j))
          end do
        end if
        exit read_loop
      end if
      read(in_unit, '(A)') data(2, i)
    end do

!  Reasd a block of data, write it in reverse order into a temporary file.

    tmp_file_number = tmp_file_number + 1
    write(tmp_file_name, TMP_FILE_FORMAT) tmp_file_number
    open(newunit=tmp_unit, file=tmp_file_name, action="WRITE", status="NEW")
    do j = max_blocks_in_memory, 1, -1
      write(tmp_unit, '(A)') data(1, j)
      write(tmp_unit, '(A)') data(2, j)
    end do
    close(tmp_unit)
  end do read_loop

! Finished with input file, don't need it any more.
  close(unit=in_unit)

! Concatenate all the temporary files in reverse order to the output file.
  do j = tmp_file_number, 1, -1
    write(tmp_file_name, TMP_FILE_FORMAT) j
    open(newunit=tmp_unit, file=tmp_file_name, action="READ", status="OLD")
    do
      read(tmp_unit, '(A)', iostat=ios) line
      if (ios == IOSTAT_END) exit
      write(out_unit, '(A)') trim(line)
    end do
    close(tmp_unit, status="DELETE")  ! Done with this file, delete it after closing.

  end do

  close(unit=out_unit)

end program reverse_order

Answer 3

好吧，我有一个答案，但是它没有用，可能是由于编译器错误或我对Fortran中文件定位的基本了解。我的尝试是使用access = 'stream'和form = 'formatted'打开输入文件。这样，我可以将行位置推入堆栈，然后弹出它们，以便它们以相反的顺序出现。然后，以相反的顺序遍历这些行，我可以将它们写入ourput文件中。

program readblk
   implicit none
   integer iunit, junit
   integer i, size
   character(20) line
   type LLnode
      integer POS
      type(LLnode), pointer :: next => NULL()
   end type LLnode
   type(LLNODE), pointer :: list => NULL(), current => NULL()
   integer POS, temp(2)

   open(newunit=iunit,file='readblk.txt',status='old',access='stream',form='formatted')
   open(newunit=junit,file='writeblk.txt',status='replace')
   do i = 1, 3
      do
         read(iunit,'(a)',advance='no',EOR=10,size=size) line
         write(junit,'(a)',advance='no') line
      end do
10    continue
      write(junit,'(a)') line(1:size)
   end do
   do
      inquire(iunit,POS=POS)
      allocate(current)
      current%POS = POS
      current%next => list
      list => current
      read(iunit,'()',end=20)
   end do
20 continue
   current => list
   list => current%next
   deallocate(current)
   do while(associated(list))
      temp(2) = list%POS
      current => list%next
      deallocate(list)
      temp(1) = current%POS
      list => current%next
      deallocate(current)
      do i = 1, 2
write(*,*) temp(i)
         read(iunit,'(a)',advance='no',EOR=30,size=size,POS=temp(i)) line
         write(junit,'(a)',advance='no') line
         do
            read(iunit,'(a)',advance='no',EOR=30,size=size) line
            write(junit,'(a)',advance='no') line
         end do
30       continue
      write(junit,'(a)') line(1:size)
      end do
   end do
end program readblk

这是我的输入文件：

Header line 1
Header line 2
Header line 3
1a34567890123456789012345678901234567890
1b34567890123456789012345678901234567890
2a34567890123456789012345678901234567890
2b34567890123456789012345678901234567890
3a34567890123456789012345678901234567890
3b34567890123456789012345678901234567890

现在使用ifort，我的文件位置被打印为

请注意，第一行位于记录3的末尾，而不是记录4的开始。输出文件为

Header line 1
Header line 2
Header line 3
3a34567890123456789012345678901234567890
3b34567890123456789012345678901234567890
2a34567890123456789012345678901234567890
2b34567890123456789012345678901234567890

1a34567890123456789012345678901234567890

使用gfortran，文件位置打印为

这一次，正如我所期望的，第一行位于记录4的开头。但是，输出文件中包含不幸的内容

Header line 1
Header line 2
Header line 3
3a34567890123456789012345678901234567890
3b34567890123456789012345678901234567890
2a34567890123456789012345678901234567890
2b34567890123456789012345678901234567890
3a34567890123456789012345678901234567890
3b345678901234567890123456789012341a34567890123456789012345678901234567890

我希望有一个更积极的结果。我无法确定结果是否是由于不良的编程或编译器错误所致，但我发布了消息，以防别人可能使我的纯Fortran解决方案正常工作。

在Fortran中，如何通过从下到上的行块将文件从一个文件写回另一个文件

3 个答案: