我陷入了自己的等待循环,并不确定为什么。该函数接受输入和输出通道,然后获取通道中的每个项目,为内容执行http.GET并从html中提取标记。
GET和scrape的过程在go例程中,我设置了一个等待组(innerWait),以确保在关闭输出通道之前我已经处理了所有内容。
func (fp FeedProducer) getTitles(in <-chan feeds.Item,
out chan<- feeds.Item,
wg *sync.WaitGroup) {
defer wg.Done()
var innerWait sync.WaitGroup
for item := range in {
log.Infof(fp.c, "Incrementing inner WaitGroup.")
innerWait.Add(1)
go func(item feeds.Item) {
defer innerWait.Done()
defer log.Infof(fp.c, "Decriment inner wait group by defer.")
client := urlfetch.Client(fp.c)
resp, err := client.Get(item.Link.Href)
log.Infof(fp.c, "Getting title for: %v", item.Link.Href)
if err != nil {
log.Errorf(fp.c, "Error retriving page. %v", err.Error())
return
}
if strings.ToLower(resp.Header.Get("Content-Type")) == "text/html; charset=utf-8" {
title := fp.scrapeTitle(resp)
item.Title = title
} else {
log.Errorf(fp.c, "Wrong content type. Received: %v from %v", resp.Header.Get("Content-Type"), item.Link.Href)
}
out <- item
}(item)
}
log.Infof(fp.c, "Waiting for title pull wait group.")
innerWait.Wait()
log.Infof(fp.c, "Done waiting for title pull.")
close(out)
}
func (fp FeedProducer) scrapeTitle(request *http.Response) string {
defer request.Body.Close()
tokenizer := html.NewTokenizer(request.Body)
var titleIsNext bool
for {
token := tokenizer.Next()
switch {
case token == html.ErrorToken:
log.Infof(fp.c, "Hit the end of the doc without finding title.")
return ""
case token == html.StartTagToken:
tag := tokenizer.Token()
isTitle := tag.Data == "title"
if isTitle {
titleIsNext = true
}
case titleIsNext && token == html.TextToken:
title := tokenizer.Token().Data
log.Infof(fp.c, "Pulled title: %v", title)
return title
}
}
}
日志内容如下所示:
2015/08/09 22:02:10 INFO: Revived query parameter: golang
2015/08/09 22:02:10 INFO: Getting active tweets from the last 7 days.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Waiting for title pull wait group.
2015/08/09 22:02:10 INFO: Getting title for: http://devsisters.github.io/goquic/
2015/08/09 22:02:10 INFO: Pulled title: GoQuic by devsisters
2015/08/09 22:02:10 INFO: Getting title for: http://whizdumb.me/2015/03/03/matching-a-string-and-extracting-values-using-regex/
2015/08/09 22:02:10 INFO: Pulled title: Matching a string and extracting values using regex | Whizdumb's blog
2015/08/09 22:02:10 INFO: Getting title for: https://www.reddit.com/r/golang/comments/3g7tyv/dropboxs_infrastructure_is_go_at_a_huge_scale/
2015/08/09 22:02:10 INFO: Pulled title: Dropbox's infrastructure is Go at a huge scale : golang
2015/08/09 22:02:10 INFO: Getting title for: http://dave.cheney.net/2015/08/08/performance-without-the-event-loop
2015/08/09 22:02:10 INFO: Pulled title: Performance without the event loop | Dave Cheney
2015/08/09 22:02:11 INFO: Getting title for: https://github.com/ccirello/sublime-gosnippets
2015/08/09 22:02:11 INFO: Pulled title: ccirello/sublime-gosnippets · GitHub
2015/08/09 22:02:11 INFO: Getting title for: https://medium.com/iron-io-blog/an-easier-way-to-create-tiny-golang-docker-images-7ba2893b160?mkt_tok=3RkMMJWWfF9wsRonuqTMZKXonjHpfsX57ewoWaexlMI/0ER3fOvrPUfGjI4ATsNrI%2BSLDwEYGJlv6SgFQ7LMMaZq1rgMXBk%3D&utm_content=buffer45a1c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
2015/08/09 22:02:11 INFO: Pulled title: An Easier Way to Create Tiny Golang Docker Images — Iron.io Blog — Medium
我可以看到我正在根据日志访问innerWait.Wait()命令,这也告诉我入站通道已在管道的另一侧关闭。
似乎没有调用匿名函数中的defer语句,因为我看不到任何地方打印的延迟日志语句。但我不能为我的生活告诉为什么因为该块中的所有代码似乎都在执行。
非常感谢帮助。
答案 0 :(得分:3)
goroutines卡在此行发送到out
:
out <- item
修复方法是在out
上启动goroutine接收。
调试这样的问题的好方法是通过向进程发送SIGQUIT来转储goroutine堆栈。