Question

我有一个非常大的文本文件，我必须从中提取一些数据。我逐行阅读文件并查找关键字。据我所知，我正在寻找的关键字更接近文件末尾而不是开头。我试过tac关键字设置fh [open“| tac filename”] 我收到错误：无法执行“tac”：没有这样的文件或目录

我的文件大小很大，所以我无法将该行存储在一个循环中并再次反转它。请提出一些解决方案

Answer 1

tac本身就是一个相当简单的程序 - 你可以在Tcl中实现它的算法，至少如果你决定以相反的顺序逐字读取每一行。但是，我认为这种约束并不是必需的 - 你说你正在寻找的内容更有可能接近结束，而不是你必须以相反的顺序扫描线。这意味着你可以做一些更简单的事情。粗略地说：

寻找靠近文件末尾的偏移量。
正常阅读，直到您点击已经处理过的数据。
从文件末尾开始向后寻找一个偏移量。
正常阅读，直到您点击已经处理过的数据。
等

通过这种方式，您实际上不必在内存中保留比您现在正在处理的单行更多的内容，并且您在数据之前处理文件末尾的数据在文件中。也许你可以通过严格按相反的顺序处理线条来提高性能，但我怀疑它与你从头到尾扫描所获得的优势相比无关紧要。

这是一些实现此算法的示例代码。请注意避免处理部分行所需的注意事项：

set BLOCKSIZE  16384
set offset     [file size $filename]
set lastOffset [file size $filename]

set f [open $filename r]
while { 1 } {
    seek $f $offset

    if { $offset > 0 } {
        # We may have accidentally read a partial line, because we don't
        # know where the line boundaries are.  Skip to the end of whatever
        # line we're in, and discard the content.  We'll get it instead
        # at the end of the _next_ block.

        gets $f
        set offset [tell $f]
    }

    while { [tell $f] < $lastOffset } {
        set line [gets $f]

        ### Do whatever you're going to do with the line here

        puts $line
    }

    set lastOffset $offset
    if { $lastOffset == 0 } {
        # All done, we just processed the start of the file.

        break
    }

    set offset [expr {$offset - $BLOCKSIZE}]
    if { $offset < 0 } {
        set offset 0
    }
}
close $f

Answer 2

逆转文件的成本实际上相当高。我能想到的最佳选择是构造一行开头的文件偏移列表，然后使用seek;gets模式来查看该列表。

set f [open $filename]

# Construct the list of indices
set indices {}
while {![eof $f]} {
    lappend indices [tell $f]
    gets $f
}

# Iterate backwards
foreach idx [lreverse $indices] {
    seek $f $idx
    set line [gets $f]

    DoStuffWithALine $line
}

close $f

这种方法的成本并不重要（即使你碰巧有一个指数的缓存，你仍然有问题），因为它不能很好地适用于操作系统如何 - 获取磁盘数据。

如何在TCL中从头到尾读取文件（以相反的顺序）？

2 个答案: