我有一个程序在udp上侦听日志流量,尝试解析它,然后将其插入Redis。在一定的流量水平下,内存似乎“爆炸”(迅速从几百兆字节增加到千兆字节。
在这种情况发生后不久我就抓住了一个堆配置文件,它返回以下内容:
(pprof) top100 -cum
Total: 1731.3 MB
0.0 0.0% 0.0% 1731.3 100.0% gosched0
1162.5 67.1% 67.1% 1162.5 67.1% newdefer
0.0 0.0% 67.1% 1162.5 67.1% runtime.deferproc
0.0 0.0% 67.1% 1162.0 67.1% main.TryParse
0.0 0.0% 67.1% 438.0 25.3% runtime.main
301.5 17.4% 84.6% 437.5 25.3% main.main
136.0 7.9% 92.4% 136.0 7.9% runtime.malg
0.0 0.0% 92.4% 136.0 7.9% runtime.newproc
0.0 0.0% 92.4% 136.0 7.9% runtime.newproc1
1.5 0.1% 92.5% 131.3 7.6% main.RedisCuller
0.0 0.0% 92.5% 108.5 6.3% github.com/garyburd/redigo/redis.(*conn).Do
0.0 0.0% 92.5% 108.5 6.3% github.com/garyburd/redigo/redis.(*conn).readReply
0.0 0.0% 92.5% 108.5 6.3% github.com/garyburd/redigo/redis.(*pooledConnection).Do
95.8 5.5% 98.0% 95.8 5.5% cnew
0.0 0.0% 98.0% 95.8 5.5% runtime.cnewarray
34.0 2.0% 100.0% 34.0 2.0% runtime.convT2E
0.0 0.0% 100.0% 0.5 0.0% main.init
0.0 0.0% 100.0% 0.5 0.0% net/http/pprof.init
0.0 0.0% 100.0% 0.5 0.0% sync.(*Once).Do
0.0 0.0% 100.0% 0.5 0.0% syscall.Getenv
0.0 0.0% 100.0% 0.5 0.0% time.init
当程序“健康”时,配置文件看起来更像:
(pprof) top20 -cum
Total: 186.7 MB
0.0 0.0% 0.0% 186.7 100.0% gosched0
0.5 0.3% 0.3% 122.7 65.7% main.RedisCuller
0.0 0.0% 0.3% 103.5 55.4% github.com/garyburd/redigo/redis.(*pooledConnection).Do
0.0 0.0% 0.3% 103.0 55.2% github.com/garyburd/redigo/redis.(*conn).Do
0.0 0.0% 0.3% 103.0 55.2% github.com/garyburd/redigo/redis.(*conn).readReply
88.2 47.2% 47.5% 88.2 47.2% cnew
0.0 0.0% 47.5% 88.2 47.2% runtime.cnewarray
0.0 0.0% 47.5% 57.0 30.5% main.TryParse
57.0 30.5% 78.0% 57.0 30.5% newdefer
0.0 0.0% 78.0% 57.0 30.5% runtime.deferproc
34.0 18.2% 96.3% 34.0 18.2% runtime.convT2E
1.5 0.8% 97.1% 6.5 3.5% main.main
0.0 0.0% 97.1% 6.5 3.5% runtime.main
5.0 2.7% 99.7% 5.0 2.7% runtime.malg
0.0 0.0% 99.7% 5.0 2.7% runtime.newproc
0.0 0.0% 99.7% 5.0 2.7% runtime.newproc1
0.0 0.0% 99.7% 0.5 0.3% bufio.NewWriter
0.0 0.0% 99.7% 0.5 0.3% bufio.NewWriterSize
0.0 0.0% 99.7% 0.5 0.3% github.com/garyburd/redigo/redis.(*Pool).get
0.0 0.0% 99.7% 0.5 0.3% github.com/garyburd/redigo/redis.(*pooledConnection).get
我的代码中唯一的延迟是围绕解析功能(因为它经常会失败):
for {
rlen, _, err := sock.ReadFromUDP(buf[0:])
checkError(err)
raw := logrow.RawRecord(string(buf[:rlen]))
go TryParse(raw, c)
}
...
func TryParse(raw logrow.RawRecord, c chan logrow.Record) {
defer func() {
if r := recover(); r != nil {
//log.Printf("Failed Parse due to panic: %v", raw)
return
}
}()
rec, ok := logrow.ParseRawRecord(raw)
if !ok {
return
//log.Printf("Failed Parse: %v", raw)
} else {
c <- rec
}
}
有没有人看到任何明显我做错的事可能导致记忆突然气球?或者可能提供一些方向来固定它?
编辑(更多代码围绕logrow.Record频道):
c := make(chan logrow.Record)
...
go RedisInserter(c, bucket, retention, pool)
func RedisInserter(c chan logrow.Record, bucket, retention int, p *redis.Pool) {
for rec := range c {
logrow.SendToRedis(rec, bucket, retention, p)
}
}
答案 0 :(得分:1)
原来是关闭延迟函数的泄漏(Go本身的一个问题),如我们所见:
defer func() {
if r := recover(); r != nil {
//log.Printf("Failed Parse due to panic: %v", raw)
return
}
}()
因此,使用最新版本的Go重建解决了这个问题(参考:https://codereview.appspot.com/10784043/)。然而,聪明的事情当然是设计ParseRecordFunction,因此它最终不会试图切出界限并引起恐慌。