Question

所以我必须在Scheme完成一个项目，而且我很困难。基本上，程序所做的是打开文件并输出统计信息。现在我可以计算字符数，但我还需要计算行数和字数。我现在只是试图解决这种情况，但最终我还要接收两个文件 - 第一个是文本文件，就像一本书。第二个是单词列表，我必须计算这些单词出现在第一个文件中的次数。显然我将不得不使用列表，但我希望得到一些帮助。这是我到目前为止（并且有效）的代码

(define filestats
          (lambda (srcf wordcount linecount charcount )

                (if (eof-object? (peek-char srcf ) )
                    (begin
                        (close-port srcf)
                        (display linecount)
                        (display " ")
                        (display wordcount)
                        (display " ")
                        (display charcount)
                        (newline) ()
                    )
                    (begin
                        (read-char srcf)
                        (filestats srcf  0 0 (+ charcount 1))   
                    )
                )

            )
)

(define filestatistics
  (lambda (src)
    (let ((file (open-input-file src)))
       (filestats file 0 0 0)
    )
  )
)

Answer 1

使用Scheme的单词计数算法之前已在Stack Overflow中进行了解释，例如在here中（向上滚动到页面顶部以查看C中的等效程序）：

(define (word-count input-port)
  (let loop ((c (read-char input-port))
             (nl 0)
             (nw 0)
             (nc 0)
             (state 'out))
    (cond ((eof-object? c)
           (printf "nl: ~s, nw: ~s, nc: ~s\n" nl nw nc))
          ((char=? c #\newline)
           (loop (read-char input-port) (add1 nl) nw (add1 nc) 'out))
          ((char-whitespace? c)
           (loop (read-char input-port) nl nw (add1 nc) 'out))
          ((eq? state 'out)
           (loop (read-char input-port) nl (add1 nw) (add1 nc) 'in))
          (else
           (loop (read-char input-port) nl nw (add1 nc) state)))))

该过程接收输入端口作为参数，因此可以将其应用于文件。请注意，对于单词和行的计数，您需要测试当前char是新行字符还是空格字符。并且需要一个额外的标志（在代码中称为state）来跟踪新单词的开始/结束。

Answer 2

如何将文件“标记”为一个行列表，其中一行是一个单词列表，一个单词是一个字符列表。

(define (tokenize file)
  (with-input-from-file file
    (lambda ()
      (let reading ((lines '()) (words '()) (chars '()))
        (let ((char (read-char)))
          (if (eof-object? char)
              (reverse lines)
              (case char
                ((#\newline) (reading (cons (reverse (cons (reverse chars) words)) lines) '() '()))
                ((#\space)   (reading lines (cons (reverse chars) words) '()))
                (else        (reading lines words (cons char chars))))))))))

一旦你完成了这个，其余的都是微不足道的。

> (tokenize "foo.data")
(((#\a #\b #\c) (#\d #\e #\f))
 ((#\1 #\2 #\3) (#\x #\y #\z)))

Scheme帮助 - 文件统计

2 个答案: