如何计算文件中的单词

时间:2019-03-31 14:17:01

标签: go

我想创建一个函数来对文件中的单词进行计数,以查找文件中每个单词的位置,我希望输出为

a,位置:0

啊,位置:1

aahed,位置:2

我已经尝试过对单词进行计数,但是我无法用它来获取单词的位置

scanner := bufio.NewScanner(strings.NewReader(input))

// Set the split function for the scanning operation.
scanner.Split(bufio.ScanWords)

// Count the words.
count := 0
for scanner.Scan() {
    count++
}

if err := scanner.Err(); err != nil {
    fmt.Fprintln(os.Stderr, "reading input:", err)
}

fmt.Printf("%d\n", count)

我可以使用for循环来做到这一点吗?因为我想索引位置。例如word [position] == word [position + 1],以找出特定位置的单词是否与下一个位置的单词相同。

2 个答案:

答案 0 :(得分:1)

想象一下,有一个testfile.txt

this is fine

您可以使用此go-script遍历每个单词,并使用当前位置打印该单词:

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    // initiate file-handle to read from
    fileHandle, err := os.Open("testfile.txt")

    // check if file-handle was initiated correctly
    if err != nil {
        panic(err)
    }

    // make sure to close file-handle upon return
    defer fileHandle.Close()

    // initiate scanner from file handle
    fileScanner := bufio.NewScanner(fileHandle)

    // tell the scanner to split by words
    fileScanner.Split(bufio.ScanWords)

    // initiate counter
    count := 0

    // for looping through results
    for fileScanner.Scan() {
        fmt.Printf("word: '%s' - position: '%d'\n", fileScanner.Text(), count)
        count++
    }

    // check if there was an error while reading words from file
    if err := fileScanner.Err(); err != nil {
        panic(err)
    }

    // print total word count
    fmt.Printf("total word count: '%d'", count)
}

输出:

$ go run main.go
word: 'this' - position: '0'
word: 'is' - position: '1'
word: 'fine' - position: '2'
total word count: '3'

如果要按索引比较单词,可以先将它们加载到切片中。

想象一下有一个文本文件:

fine this is fine

使用此代码:

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    // initiate file-handle to read from
    fileHandle, err := os.Open("testfile.txt")

    // check if file-handle was initiated correctly
    if err != nil {
        panic(err)
    }

    // make sure to close file-handle upon return
    defer fileHandle.Close()

    // initiate scanner from file handle
    fileScanner := bufio.NewScanner(fileHandle)

    // tell the scanner to split by words
    fileScanner.Split(bufio.ScanWords)

    // initiate wordsSlice
    var wordSlice []string

    // for looping through results
    for fileScanner.Scan() {
        wordSlice = append(wordSlice, fileScanner.Text())
    }

    // check if there was an error while reading words from file
    if err := fileScanner.Err(); err != nil {
        panic(err)
    }

    // loop through word slice and print word with index
    for i, w := range wordSlice {
        fmt.Printf("word: '%s' - position: '%d'\n", w, i)
    }

    // compare words by index
    firstWordPos := 0
    equalsWordPos := 3
    if wordSlice[firstWordPos] == wordSlice[equalsWordPos] {
        fmt.Printf("word at position '%d' and '%d' is equal: '%s'\n", firstWordPos, equalsWordPos, wordSlice[firstWordPos])
    }

    // print total word count
    fmt.Printf("total word count: '%d'", len(wordSlice))
}

输出:

$ go run main.go
word: 'fine' - position: '0'
word: 'this' - position: '1'
word: 'is' - position: '2'
word: 'fine' - position: '3'
word at position '0' and '3' is equal: 'fine'
total word count: '4'

答案 1 :(得分:1)

您一次只能读取一个字符的输入字符串。这样,您可以完全控制需要输出的数据。在Go中,字符称为符文:

b, err := ioutil.ReadFile("test.txt")
if err != nil {
    panic(err)
}

reader := bytes.NewReader(b)
// Word is temporary word buffer that we use to collect characters for current word.
word := strings.Builder{}
wordPos := 0
line := 0
pos := 0
for {
    // Read next character
    if r, _, err := reader.ReadRune(); err != nil {
        if err == io.EOF {
            // Output last word if this is end of file
            fmt.Println(word.String(), "line:", line, "position:", wordPos)
            break
        } else {
            panic(err)
        }
    } else {
        // If current character is new line reset position counters and word buffer.
        if r == '\n' {
            fmt.Println(word.String(), "line:", line, "position:", wordPos)
            word.Reset()
            pos = 0
            wordPos = 0
            line++
        } else if r == ' ' { // Found word separator: output word, reset word buffer and set next word position.
            fmt.Println(word.String(), "line:", line, "position:", wordPos)
            word.Reset()
            wordPos = pos + 1
            pos++
        } else { // Just a regular character: write it to word buffer.
            word.WriteRune(r)
            pos++
        }
    }
}

我使用strings.Builder来消除不必要的字符串复制。

此外,您还必须调整此示例以使其适用于空行等边缘情况,甚至其他情况。