JVM char数组占用大量内存

时间:2017-04-29 23:38:11

标签: java multithreading scala memory-management jvm

我遇到与此处JVM Monitor char array memory usage相同的问题。但是我没有从这个问题得到明确的答案,因为名声不好而无法添加评论。所以,我在这里问。

我编写了一个多线程程序来计算单词共现频率。我正懒散地从文件中读取文字并进行计算。在程序中,我有一个地图,其中包含单词对及其共现计数。完成计数操作后,我将此地图写入文件。

这是我的问题:

将频率图写入文件后。文件的大小例如是3GB。但是当程序运行时,使用的内存是35gb ram + 5gb交换区域。然后我监视jvm,内存图片是这样的:memory picture和垃圾收集器图片是这样的:garbage collector picture和参数overwiew:overview 当输出文件大小为3gb时,char []数组如何占用这么多内存?感谢。

Okey,这是导致此问题的代码:

此代码不是多线程的,用于合并包含共同出现的单词及其计数的两个文件。并且此代码也会导致相同的内存使用问题,而且由于堆空间使用率过高,此代码会导致大量gc调用,因此正常程序无法运行,因为停止了垃圾收集器:

import java.io.{BufferedWriter, File, FileWriter, FilenameFilter}
import java.util.regex.Pattern

import core.WordTuple

import scala.collection.mutable.{Map => mMap}
import scala.io.{BufferedSource, Source}

class PairWordsMerger(path: String, regex: String) {

  private val wordsAndCounts: mMap[WordTuple, Int] = mMap[WordTuple, Int]()
  private val pattern: Pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE)
  private val dir: File = new File(path)
  private var sWordAndCount: Array[String] = Array.fill(3)("")
  private var tempTuple: WordTuple = WordTuple("","")
  private val matchedFiles: Array[File] = dir.listFiles(new FilenameFilter {
    override def accept(dir: File, name: String): Boolean = pattern.matcher(name).matches()
  })

  def merge(): Unit = {
    for(fileName <- matchedFiles) {
      val file: BufferedSource = Source.fromFile(fileName)
      val iter: Iterator[String] = file.getLines()

      while(iter.hasNext) {
//here I used split like this because entries in the file
//are hold in this format: word1,word2,frequency
        sWordAndCount = iter.next().split(",")
        tempTuple = WordTuple(sWordAndCount(0), sWordAndCount(1))
        try {
          wordsAndCounts += (tempTuple -> (wordsAndCounts.getOrElse(tempTuple, 0) + sWordAndCount(2).toInt))
        } catch {
          case e: NumberFormatException => println("Cannot parse to int...")
        }
      }
      file.close()
      println("One pair words map update done")
    }
    writeToFile()
  }

  private def writeToFile(): Unit = {
    val f: File = new File("allPairWords.txt")
    val out = new BufferedWriter(new FileWriter(f))

    for(elem <- wordsAndCounts) {
      out.write(elem._1 + "," + elem._2 + "\n")
    }
    out.close()
  }
}

object PairWordsMerger {
  def apply(path: String, regex: String): PairWordsMerger = new PairWordsMerger(path, regex)
}

0 个答案:

没有答案