我在互联网上看到了几个问题,这些问题已经松散地讨论了为什么人们应该使用bufio.Scanner而不是bufio.Reader。
我不知道我的测试用例是否相关,但在从文本文件中读取1,000,000行时,我决定测试一对另一对:
package main
import (
"fmt"
"strconv"
"bufio"
"time"
"os"
//"bytes"
)
func main() {
fileName := "testfile.txt"
// Create 1,000,000 integers as strings
numItems := 1000000
startInitStringArray := time.Now()
var input [1000000]string
//var input []string
for i:=0; i < numItems; i++ {
input[i] = strconv.Itoa(i)
//input = append(input,strconv.Itoa(i))
}
elapsedInitStringArray := time.Since(startInitStringArray)
fmt.Printf("Took %s to populate string array.\n", elapsedInitStringArray)
// Write to a file
fo, _ := os.Create(fileName)
for i:=0; i < numItems; i++ {
fo.WriteString(input[i] + "\n")
}
fo.Close()
// Use reader
openedFile, _ := os.Open(fileName)
startReader := time.Now()
reader := bufio.NewReader(openedFile)
for i:=0; i < numItems; i++ {
reader.ReadLine()
}
elapsedReader := time.Since(startReader)
fmt.Printf("Took %s to read file using reader.\n", elapsedReader)
openedFile.Close()
// Use scanner
openedFile, _ = os.Open(fileName)
startScanner := time.Now()
scanner := bufio.NewScanner(openedFile)
for i:=0; i < numItems; i++ {
scanner.Scan()
scanner.Text()
}
elapsedScanner := time.Since(startScanner)
fmt.Printf("Took %s to read file using scanner.\n", elapsedScanner)
openedFile.Close()
}
我在时间上收到的相当平均的输出看起来像这样:
Took 44.1165ms to populate string array.
Took 17.0465ms to read file using reader.
Took 23.0613ms to read file using scanner.
我很好奇,何时使用阅读器和扫描仪更好,是基于性能还是功能?
答案 0 :(得分:4)
这是一个有缺陷的基准。他们没有做同样的事情。
func (b *Reader) ReadLine() (line []byte, isPrefix bool, err error)
返回[]byte
。
func (s *Scanner) Text() string
返回string([]byte)
为了具有可比性,请使用
func (s *Scanner) Bytes() []byte
这是一个有缺陷的基准。它读取短字符串,从“0\n
”到“999999\n
”的整数。真实世界的数据集是什么样的?
在现实世界中,我们读到了莎士比亚:http://www.gutenberg.org/ebooks/100:纯文本UTF-8:pg100.txt
。
Took 2.973307ms to read file using reader. size: 5340315 lines: 124787
Took 2.940388ms to read file using scanner. size: 5340315 lines: 124787