Question

这是我遇到的一个有趣情况。在使用go-routines进行了一些数据操作之后，我需要从文件中读取并根据我们发现的内容填充地图。这是简化的问题陈述和示例：

通过运行gen_data.sh

生成所需的数据

#!/bin/bash 

rm some.dat || : 
for i in `seq 1 10000`; do 
    echo "$i `date` tx: $RANDOM rx:$RANDOM" >> some.dat
done

如果我不使用some.dat将map[int]string中的那些行读入loadtoDict.go中而不使用例程，它将保持对齐。（因为第一和第二个字相同，请参见下面的o / p。）

在现实生活中，我确实需要处理这些线（昂贵），然后再将它们加载到地图中，使用go-routines可以加快字典的创建速度，这是解决实际问题的重要条件。

loadtoDict.go

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
)

var (
    fileName = "some.dat"
)

func checkerr(err error) {
    if err != nil {
        fmt.Println(err)
        log.Fatal(err)
    }
}

func main() {
    ourDict := make(map[int]string)
    f, err := os.Open(fileName)
    checkerr(err)
    defer f.Close()

    fscanner := bufio.NewScanner(f)

    indexPos := 1

    for fscanner.Scan() {
        text := fscanner.Text()
        //fmt.Println("text", text)
        ourDict[indexPos] = text
        indexPos++

    }

    for i, v := range ourDict {
        fmt.Printf("%d: %s\n", i, v)
    }

}

运行：

$ ./loadtoDict
...
8676: 8676 Mon Dec 23 15:52:24 PST 2019 tx: 17718 rx:1133
2234: 2234 Mon Dec 23 15:52:20 PST 2019 tx: 13170 rx:15962
3436: 3436 Mon Dec 23 15:52:21 PST 2019 tx: 17519 rx:5419
6177: 6177 Mon Dec 23 15:52:23 PST 2019 tx: 5731 rx:5449

注意第一个和第二个单词如何“对齐”。但是，如果我使用go-routines加载地图，则会出错：

async_loadtoDict.go

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
    "sync"
)

var (
    fileName = "some.dat"
    mu       = &sync.RWMutex{}
    MAX = 9000
)

func checkerr(err error) {
    if err != nil {
        fmt.Println(err)
        log.Fatal(err)
    }
}

func main() {
    ourDict := make(map[int]string)
    f, err := os.Open(fileName)
    checkerr(err)
    defer f.Close()

    fscanner := bufio.NewScanner(f)

    indexPos := 1
    var wg sync.WaitGroup
    sem := make(chan int, MAX)
    defer close(sem)

    for fscanner.Scan() {
        text := fscanner.Text()
        wg.Add(1)
        sem <- 1
        go func() {
            mu.Lock()
            defer mu.Unlock()
            ourDict[indexPos] = text
            indexPos++
            <- sem
            wg.Done()
        }()

    }

    wg.Wait()

    for i, v := range ourDict {
        fmt.Printf("%d: %s\n", i, v)
    }

}

输出：

$ ./async_loadtoDict 
...
11: 22 Mon Dec 23 15:52:19 PST 2019 tx: 25688 rx:7602
5716: 6294 Mon Dec 23 15:52:23 PST 2019 tx: 28488 rx:3572
6133: 4303 Mon Dec 23 15:52:21 PST 2019 tx: 24286 rx:1565
7878: 9069 Mon Dec 23 15:52:25 PST 2019 tx: 16863 rx:24234
8398: 7308 Mon Dec 23 15:52:23 PST 2019 tx: 4321 rx:20642
9566: 3489 Mon Dec 23 15:52:21 PST 2019 tx: 14447 rx:12630
2085: 2372 Mon Dec 23 15:52:20 PST 2019 tx: 14375 rx:24151

这是尽管通过互斥来保护ourDict[indexPos]的摄取。我希望地图索引与提取尝试保持一致。

谢谢！

Answer 1

您的信号量sem无法正常工作，因为您已对其进行了深度缓冲。

通常，这是为此类任务设置映射的错误方法，因为读取文件会很慢。如果您有更复杂的任务（例如，读一行，仔细考虑一下，设置一些内容），则将其作为伪代码结构：

type workType struct {
    index int
    line  string
}

var wg sync.WaitGroup
wg.Add(nWorkers)
// I made this buffered originally but there's no real point, so
// fixing that in an edit
work := make(chan workType)
for i := 0; i < nWorkers; i++ {
    go readAndDoWork(work, &wg)
}

for i := 1; fscanner.Scan(); i++ {
    work <- workType{index: i, line: fscanner.Text()}
}
close(work)
wg.Wait()

... now your dictionary is ready ...

与工人一起这样做：

func readAndDoWork(ch chan workType, wg *sync.WorkGroup) {
    for item := range ch {
        ... do computation ...
        insertIntoDict(item.index, result)
    }
    wg.Done()
}

使用insertIntoDict抓住互斥锁（以保护地图从索引到结果）并写入字典。（您可以根据需要内联。）

这里的想法是设置一些工作程序（可能基于可用的CPU数量），每个工作程序都捕获并处理下一个工作项。主goroutine只是打包工作，然后关闭工作通道，这将使所有工作人员看到输入的结尾，然后等待他们发信号表示已完成计算。

（如果愿意，您可以再创建一个goroutine来读取工作人员计算的结果，并将其放入地图中。这样，地图本身就不需要互斥体。）

Answer 2

正如我在评论中提到的那样，您无法控制goroutine的执行顺序，因此不应从它们内部更改索引。

在此示例中，与地图的交互在单个goroutine中，而您的处理在其他goroutine中。

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
    "sync"
)

var (
    fileName = "some.dat"
    MAX      = 9000
)

func checkerr(err error) {
    if err != nil {
        fmt.Println(err)
        log.Fatal(err)
    }
}

type result struct {
    index int
    data string
}

func main() {
    ourDict := make(map[int]string)
    f, err := os.Open(fileName)
    checkerr(err)
    defer f.Close()

    fscanner := bufio.NewScanner(f)

    var wg sync.WaitGroup
    sem := make(chan struct{}, MAX) // Use empty structs for semaphores as they have no allocation
    defer close(sem)
    out := make(chan result)
    defer close(out)
    indexPos := 1

    for fscanner.Scan() {
        text := fscanner.Text()
        wg.Add(1)
        sem <- struct{}{}

        go func(index int, data string) {
            // Defer the release of your resources, otherwise if any error occur in your goroutine
            // you'll have a deadlock
            defer func() {
                wg.Done()
                <-sem
            }()
            // Process your data
            out <- result{index, data}
        }(indexPos, text) // Pass in the data that will change on the iteration, go optimizer will move it around better

        indexPos++
    }

    // The goroutine is the only one to write to the dict, so no race condition
    go func() {
        for {
            if entry, ok := <-out; ok {
                ourDict[entry.index] = entry.data
            } else {
                return // Exit goroutine when channel closes
            }
        }
    }()

    wg.Wait()

    for i, v := range ourDict {
        fmt.Printf("%d: %s\n", i, v)
    }

}

Answer 3

好的，我已经弄清楚了。通过复制为goroutine赋予一个值以使其挂起，似乎可以正常工作。

已更改：

for fscanner.Scan() {
    text := fscanner.Text()
    wg.Add(1)
    sem <- 1
    go func() {
        mu.Lock()
        defer mu.Unlock()
        ourDict[indexPos] = text
        indexPos++
        <- sem
        wg.Done()
    }()

}

到

for fscanner.Scan() {
        text := fscanner.Text()
        wg.Add(1)
        sem <- 1
        go func(mypos int) {
                mu.Lock()
                defer mu.Unlock()
                ourDict[mypos] = text
                <-sem
                wg.Done()
        }(indexPos)
        indexPos++
}

完整代码：https://play.golang.org/p/dkHaisPHyHz

使用一群工人

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
    "sync"
)

const (
    MAX      = 10
    fileName = "some.dat"
)

type gunk struct {
    line string
    id   int
}

func main() {
    ourDict := make(map[int]string)
    wg := sync.WaitGroup{}
    mu := sync.RWMutex{}

    cha := make(chan gunk)

    for i := 0; i < MAX; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            for {
                textin, ok := <-cha
                if !ok {
                    return
                }
                mu.Lock()
                ourDict[textin.id] = textin.line
                mu.Unlock()
            }
        }(i)
    }

    f, err := os.Open(fileName)
    checkerr(err)
    defer f.Close()
    fscanner := bufio.NewScanner(f)
    indexPos := 1

    for fscanner.Scan() {
        text := fscanner.Text()
        thisgunk := gunk{line: text, id: indexPos}
        cha <- thisgunk
        indexPos++
    }

    close(cha)
    wg.Wait()
    for i, v := range ourDict {
        fmt.Printf("%d: %s\n", i, v)
    }

}

func checkerr(err error) {
    if err != nil {
        fmt.Println(err)
        log.Fatal(err)
    }
}

加载包含和不包含例程的地图

3 个答案: