在Fortran中,如何通过从下到上的行块将文件从一个文件写回另一个文件

时间:2018-07-13 15:46:10

标签: file fortran out-of-memory on-the-fly

我有一个ASCII文件,看起来像:

____________________________________________
Header1 ...
Header2 ...
Header3 ...
block(1)data1 block(1)data2 block(1)data3
block(1)data4 block(1)data5 block(1)data6
block(2)data1 block(2)data2 block(2)data3
block(2)data4 block(2)data5 block(2)data6
...
block(n)data1 block(n)data2 block(n)data3
block(n)data4 block(n)data5 block(n)data6
____________________________________________

我想将其转换为如下所示的ASCII文件:

____________________________________________
HeaderA ...
HeaderB ...
block(n)data1 block(n)data2 block(n)data3
block(n)data4 block(n)data5 block(n)data6
block(n-1)data1 block(n-1)data2 block(n-1)data3
block(n-1)data4 block(n-1)data5 block(n-1)data6
....
block(1)data1 block(1)data2 block(1)data3
block(1)data4 block(1)data5 block(1)data6
____________________________________________

数据主要是实数,并且数据集的大小太大,无法使用可分配的数组。因此,我可以通过某种方式即时读写。

我找不到在文件中向后读或写的方法。

3 个答案:

答案 0 :(得分:0)

我不会直接使用Fortran,而是使用一系列Linux命令(或Windows上的Cygwin / GNU utils)。 Fortran也可以(请参见第二种可能性)。

概述(基于OS命令):

  • 获取总行数(例如,wc
  • 从文件中取出前3行(例如,使用head)到文件result file
  • 加工主体
    • 最后三行(例如tail
    • 通过连接相关行的awk脚本运行结果
    • 对结果运行tac
    • 运行另一个awk脚本以分割行
    • 将结果追加到result file

另一个想法是(用编程语言):

  • 使用每个块的开始文件位置创建数组(ftell的结果)。
  • 将标题移到新文件
  • 从头到尾遍历上面创建的数组
    • fseek移至指定位置
    • 读取相关行数并再次写出

答案 1 :(得分:0)

  

使用可分配数组的大方法。

如果数据适合内存,则可以这样做。我已经测试过了,一个文件

header(1)
header(2)
header(3)
block(1).data1 block(1).data2 block(1).data3
block(1).data4 block(1).data5 block(1).data6
block(2).data1 block(2).data2 block(2).data3
block(2).data4 block(2).data5 block(2).data6
...
block(9999998).data1 block(9999998).data2 block(9999998).data3
block(9999998).data4 block(9999998).data5 block(9999998).data6
block(9999999).data1 block(9999999).data2 block(9999999).data3
block(9999999).data4 block(9999999).data5 block(9999999).data6

文件大小为1.2GB的这个awk脚本可以将其反转:

#!/usr/bin/awk
# if line contains word "header", print immediately, move on to next line.
/header/ {print; next}

# move every line to memory.
  {
    line[n++] = $0
  }

# When finished, print them out in order n-1, n, n-3, n-2, n-5, n-4, ...
END {
  for (i=n-2; i>=0; i-=2) {
    print(line[i])
    print(line[i+1])
  }
}

在2分钟之内。

如果这实际上是不可能的,则需要执行@ high-performance-mark所说的操作,并以可管理的块形式读取它,在内存中将其反转,然后最后将它们连接在一起。这是我的版本:

program reverse_order
  use iso_fortran_env, only: IOSTAT_END
  implicit none
  integer, parameter :: max_blocks_in_memory = 10000
  integer, parameter :: max_line_length=100
  character(len=max_line_length) :: line
  character(len=max_line_length) :: data(2, max_blocks_in_memory)

  character(len=*), parameter :: INFILE='data.txt'
  character(len=*), parameter :: OUTFILE='reversed_data.txt'
  character(len=*), parameter :: TMP_FILE_FORMAT='("/tmp/", I10.10,".txt")'
  character(len=len("/tmp/XXXXXXXXXX.txt")) :: tmp_file_name

  integer :: in_unit, out_unit, tmp_unit
  integer :: num_headers, i, j, tmp_file_number

  integer :: ios

! Open the input and output files
  open(newunit=in_unit, file=INFILE, action="READ", status='OLD')
  open(newunit=out_unit, file=OUTFILE, action='WRITE', status='REPLACE')

! Transfer the headers to the output file immediately.
  num_headers = 0
  do
    read(in_unit, '(A)') line
    if (index(line, 'header') == 0) exit
    num_headers = num_headers + 1
    write(out_unit, '(A)') trim(line)
  end do

! We've already read the first data line, so let's rewind and start anew.
  rewind(in_unit)
! move past the headers.
  do i = 1, num_headers
    read(in_unit, *)
  end do


  tmp_file_number = 0

! Read the data from the input line max_blocks_in_memory blocks at a time.
  read_loop : do
    do i = 1, max_blocks_in_memory
      read(in_unit, '(A)', iostat=ios) data(1, i)
      if (ios == IOSTAT_END) then   ! Reached the end of the input file.
        if (i > 1) then         ! Still have final values in memory, write them
                                ! to output immediately.
          do j = i-1, 1, -1
            write(out_unit, '(A)') trim(data(1, j))
            write(out_unit, '(A)') trim(data(2, j))
          end do
        end if
        exit read_loop
      end if
      read(in_unit, '(A)') data(2, i)
    end do

!  Reasd a block of data, write it in reverse order into a temporary file.

    tmp_file_number = tmp_file_number + 1
    write(tmp_file_name, TMP_FILE_FORMAT) tmp_file_number
    open(newunit=tmp_unit, file=tmp_file_name, action="WRITE", status="NEW")
    do j = max_blocks_in_memory, 1, -1
      write(tmp_unit, '(A)') data(1, j)
      write(tmp_unit, '(A)') data(2, j)
    end do
    close(tmp_unit)
  end do read_loop

! Finished with input file, don't need it any more.
  close(unit=in_unit)

! Concatenate all the temporary files in reverse order to the output file.
  do j = tmp_file_number, 1, -1
    write(tmp_file_name, TMP_FILE_FORMAT) j
    open(newunit=tmp_unit, file=tmp_file_name, action="READ", status="OLD")
    do
      read(tmp_unit, '(A)', iostat=ios) line
      if (ios == IOSTAT_END) exit
      write(out_unit, '(A)') trim(line)
    end do
    close(tmp_unit, status="DELETE")  ! Done with this file, delete it after closing.

  end do

  close(unit=out_unit)

end program reverse_order

答案 2 :(得分:0)

好吧,我有一个答案,但是它没有用,可能是由于编译器错误或我对Fortran中文件定位的基本了解。我的尝试是使用access = 'stream'form = 'formatted'打开输入文件。这样,我可以将行位置推入堆栈,然后弹出它们,以便它们以相反的顺序出现。然后,以相反的顺序遍历这些行,我可以将它们写入ourput文件中。

program readblk
   implicit none
   integer iunit, junit
   integer i, size
   character(20) line
   type LLnode
      integer POS
      type(LLnode), pointer :: next => NULL()
   end type LLnode
   type(LLNODE), pointer :: list => NULL(), current => NULL()
   integer POS, temp(2)

   open(newunit=iunit,file='readblk.txt',status='old',access='stream',form='formatted')
   open(newunit=junit,file='writeblk.txt',status='replace')
   do i = 1, 3
      do
         read(iunit,'(a)',advance='no',EOR=10,size=size) line
         write(junit,'(a)',advance='no') line
      end do
10    continue
      write(junit,'(a)') line(1:size)
   end do
   do
      inquire(iunit,POS=POS)
      allocate(current)
      current%POS = POS
      current%next => list
      list => current
      read(iunit,'()',end=20)
   end do
20 continue
   current => list
   list => current%next
   deallocate(current)
   do while(associated(list))
      temp(2) = list%POS
      current => list%next
      deallocate(list)
      temp(1) = current%POS
      list => current%next
      deallocate(current)
      do i = 1, 2
write(*,*) temp(i)
         read(iunit,'(a)',advance='no',EOR=30,size=size,POS=temp(i)) line
         write(junit,'(a)',advance='no') line
         do
            read(iunit,'(a)',advance='no',EOR=30,size=size) line
            write(junit,'(a)',advance='no') line
         end do
30       continue
      write(junit,'(a)') line(1:size)
      end do
   end do
end program readblk

这是我的输入文件:

Header line 1
Header line 2
Header line 3
1a34567890123456789012345678901234567890
1b34567890123456789012345678901234567890
2a34567890123456789012345678901234567890
2b34567890123456789012345678901234567890
3a34567890123456789012345678901234567890
3b34567890123456789012345678901234567890

现在使用ifort,我的文件位置被打印为

 214
 256
 130
 172
  44
  88

请注意,第一行位于记录3的末尾,而不是记录4的开始。输出文件为

Header line 1
Header line 2
Header line 3
3a34567890123456789012345678901234567890
3b34567890123456789012345678901234567890
2a34567890123456789012345678901234567890
2b34567890123456789012345678901234567890

1a34567890123456789012345678901234567890

使用gfortran,文件位置打印为

 214
 256
 130
 172
  46
  88

这一次,正如我所期望的,第一行位于记录4的开头。但是,输出文件中包含不幸的内容

Header line 1
Header line 2
Header line 3
3a34567890123456789012345678901234567890
3b34567890123456789012345678901234567890
2a34567890123456789012345678901234567890
2b34567890123456789012345678901234567890
3a34567890123456789012345678901234567890
3b345678901234567890123456789012341a34567890123456789012345678901234567890

我希望有一个更积极的结果。我无法确定结果是否是由于不良的编程或编译器错误所致,但我发布了消息,以防别人可能使我的纯Fortran解决方案正常工作。