我有一个用GO编写的任务,可以从AWS S3读取.gz文件,每个.gz文件的大小为 20M 。
每个goroutine将从.s3文件中下载一个.gz文件到本地磁盘,然后通过gzip.NewReader逐行读取其内容。
当任务(goroutine)计数超过70时,67个goroutine将成功完成其操作。但是剩下的goroutine将被 暂停 几分钟。
在它暂停的那一刻,我看到CPU为100%,然后,CPU将降低到0.2%(4CPU,16G内存)以保持几分钟。
问题 :我很困惑,为什么那时goroutines在CPU相当低的情况下什么也不做?可能是什么原因造成的?
Test results:
60 goroutines, it will finish successfully in 27 seconds;
70 goroutines, it will finish successfully in 30s;
80 goroutines, 77 of them will finish successfully in 30s, but the remaining 3 will **pause 4 minutes**, then restart and finish in 10s
代码如下:
主要goroutine:
func main() {
runtime.GOMAXPROCS(runtime.NumCPU()*2)
chs := make([]chan string, 70)
for i:=min; i<max; i++ {
chs[i] = make(chan string, 1)
go readobj.Reads3obj(i, chs[i])
}
for _, ch := range chs {
fmt.Println(<-ch)
}
}
ReadObject:
func Reads3obj(s3KeyName, ch chan string) {
sess, err := session.NewSession(&aws.Config{
Region: aws.String("x"),
Credentials: credentials.NewStaticCredentials("x", "x", "")},)
downloader := s3manager.NewDownloader(sess)
//create a zipFile to download from S3 to local
zipFile, err := os.Create(zipname)
//download .gz file from S3
n, err := downloader.Download(zipFile, &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(s3KeyName),
})
//create a file reader
fileReader, err := os.Open(zipFile)
//create a gzip reader
gzipReader, err := gzip.NewReader(fileReader)
//create a buffered reader
buf := bufio.NewReader(gzipReader)
//read zip file line by line
for line, isPrefix, err := []byte{0}, false, error(nil); len(line) > 0 && err == nil; {
line, isPrefix, err = buf.ReadLine()
//insert line to ES
}
}
编辑: 暂停时,打开的文件数量非常少,所以我认为打开的文件没有超过最大数量,这无关紧要。因为:
ll /proc/PID/fd
的输出是
zc@ip-xxx:/proc/18059/fd$ ll
total 0
dr-x------ 2 zc zc 0 Dec 26 06:50 ./
dr-xr-xr-x 9 zc zc 0 Dec 26 06:50 ../
lrwx------ 1 zc zc 64 Dec 26 06:50 0 -> /dev/pts/0
lrwx------ 1 zc zc 64 Dec 26 06:50 1 -> /dev/pts/0
lrwx------ 1 zc zc 64 Dec 26 06:50 12 ->
/home/zc/75.gz
lrwx------ 1 zc zc 64 Dec 26 06:50 2 -> /dev/pts/0
lrwx------ 1 zc zc 64 Dec 26 06:50 21 ->
/home/zc/78.gz
lrwx------ 1 zc zc 64 Dec 26 06:50 253 -> socket:[76054]
lrwx------ 1 zc zc 64 Dec 26 06:50 280 -> socket:[77064]
lrwx------ 1 zc zc 64 Dec 26 06:50 47 ->
/home/zc/58.gz
lrwx------ 1 zc zc 64 Dec 26 06:50 65 -> anon_inode:[eventpoll]
lrwx------ 1 zc zc 64 Dec 26 06:50 93 -> socket:[75984]
答案 0 :(得分:0)
Golang gzip库在一个goroutine中读取文件
如此大的文件会长时间100%占用cpu。请改用pzip。
用作gzip的替代品,交换
import "compress/gzip"
与
import gzip "github.com/klauspost/pgzip".