List与Vector上的Scala内存问题

时间:2015-09-12 23:42:01

标签: list scala memory vector

我在Scala中编写了一个项目Euler问题#59的解决方案,我不明白为什么在Vector和List之间切换会增加我认为的内存泄漏。

这是一个使用向量的强大的强力解决方案。

val code = scala.io.Source.fromFile("e59.txt").getLines()
    .flatMap(l => l.split(',')).map(_.toInt).toVector

val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toVector

def decode(k: Int)(code: Vector[Int])(pswd: Vector[Int]): Vector[Int] = {
    code.grouped(k).flatMap(cs => cs.toVector.zip(pswd).map(t => t._1 ^ t._2)).toVector
}

def scoreText(text: Vector[Int]): Int = {
    if (text.contains((c: Int) => (c < 0 || c > 128))) -1
    else {
        val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
        words.length - words.diff(commonWords).length
    }
}

lazy val psswds = for {
    a <- (97 to 122);
    b <- (97 to 122);
    c <- (97 to 122)
    } yield Vector(a, b, c)

val ans = psswds.toStream.map(decode(3)(code))
    .map(text => (text, scoreText(text)))
    .maxBy(_._2)._1.sum

println(ans)

我将原始代码(有序整数的集合),每个密码和一些常用英文单词存储为Vector s。

但是,如果我用Vector替换List,我的程序会因每个检查过的密码而变慢并最终崩溃:

val code = scala.io.Source.fromFile("e59.txt").getLines()
    .flatMap(l => l.split(',')).map(_.toInt).toList

val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toList

def decode(k: Int)(code: List[Int])(pswd: List[Int]): List[Int] = {
    println(pswd)
    code.grouped(k).flatMap(cs => cs.toList.zip(pswd).map(t => t._1 ^ t._2)).toList
}

def scoreText(text: List[Int]): Int = {
    if (text.contains((c: Int) => (c < 0 || c > 128))) -1
    else {
        val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
        words.length - words.diff(commonWords).length
    }
}

lazy val psswds = for {
    a <- (97 to 122);
    b <- (97 to 122);
    c <- (97 to 122)
    } yield List(a, b, c)

val ans = psswds.toStream.map(decode(3)(code))
    .map(text => (text, scoreText(text)))
    .maxBy(_._2)._1.sum

println(ans)

错误:

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.String.valueOf(String.java:2861)
    at java.lang.Character.toString(Character.java:4439)
    at java.lang.String.valueOf(String.java:2847)
    at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
    at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:349)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:342)
    at scala.collection.AbstractTraversable.addString(Traversable.scala:104)
    at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:308)
    at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
    at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:310)
    at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
    at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:312)
    at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
    at Main$$anon$1.Main$$anon$$scoreText(e59_list.scala:14)
    at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
    at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1222)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1212)
    at scala.collection.immutable.Stream.foreach(Stream.scala:595)
    at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:227)
    at scala.collection.AbstractTraversable.maxBy(Traversable.scala:104)
    at Main$$anon$1.<init>(e59_list.scala:27)
    at Main$.main(e59_list.scala:1)
    at Main.main(e59_list.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:70)

使用的文件: common_words.txt

a
able
about
across
after
all
almost
also
am
among
an
and
any
are
as
at
be
because
been
but
by
can
cannot
could
dear
did
do
does
either
else
ever
every
for
from
get
got
had
has
have
he
her
hers
him
his
how
however
i
if
in
into
is
it
its
just
least
let
like
likely
may
me
might
most
must
my
neither
no
nor
not
of
off
often
on
only
or
other
our
own
rather
said
say
says
she
should
since
so
some
than
that
the
their
them
then
there
these
they
this
tis
to
too
twas
us
wants
was
we
were
what
when
where
which
while
who
whom
why
will
with
would
yet
you
your

e59.txt

79,59,12,2,79,35,8,28,20,2,3,68,8,9,68,45,0,12,9,67,68,4,7,5,23,27,1,21,79,85,78,79,85,71,38,10,71,27,12,2,79,6,2,8,13,9,1,13,9,8,68,19,7,1,71,56,11,21,11,68,6,3,22,2,14,0,30,79,1,31,6,23,19,10,0,73,79,44,2,79,19,6,28,68,16,6,16,15,79,35,8,11,72,71,14,10,3,79,12,2,79,19,6,28,68,32,0,0,73,79,86,71,39,1,71,24,5,20,79,13,9,79,16,15,10,68,5,10,3,14,1,10,14,1,3,71,24,13,19,7,68,32,0,0,73,79,87,71,39,1,71,12,22,2,14,16,2,11,68,2,25,1,21,22,16,15,6,10,0,79,16,15,10,22,2,79,13,20,65,68,41,0,16,15,6,10,0,79,1,31,6,23,19,28,68,19,7,5,19,79,12,2,79,0,14,11,10,64,27,68,10,14,15,2,65,68,83,79,40,14,9,1,71,6,16,20,10,8,1,79,19,6,28,68,14,1,68,15,6,9,75,79,5,9,11,68,19,7,13,20,79,8,14,9,1,71,8,13,17,10,23,71,3,13,0,7,16,71,27,11,71,10,18,2,29,29,8,1,1,73,79,81,71,59,12,2,79,8,14,8,12,19,79,23,15,6,10,2,28,68,19,7,22,8,26,3,15,79,16,15,10,68,3,14,22,12,1,1,20,28,72,71,14,10,3,79,16,15,10,68,3,14,22,12,1,1,20,28,68,4,14,10,71,1,1,17,10,22,71,10,28,19,6,10,0,26,13,20,7,68,14,27,74,71,89,68,32,0,0,71,28,1,9,27,68,45,0,12,9,79,16,15,10,68,37,14,20,19,6,23,19,79,83,71,27,11,71,27,1,11,3,68,2,25,1,21,22,11,9,10,68,6,13,11,18,27,68,19,7,1,71,3,13,0,7,16,71,28,11,71,27,12,6,27,68,2,25,1,21,22,11,9,10,68,10,6,3,15,27,68,5,10,8,14,10,18,2,79,6,2,12,5,18,28,1,71,0,2,71,7,13,20,79,16,2,28,16,14,2,11,9,22,74,71,87,68,45,0,12,9,79,12,14,2,23,2,3,2,71,24,5,20,79,10,8,27,68,19,7,1,71,3,13,0,7,16,92,79,12,2,79,19,6,28,68,8,1,8,30,79,5,71,24,13,19,1,1,20,28,68,19,0,68,19,7,1,71,3,13,0,7,16,73,79,93,71,59,12,2,79,11,9,10,68,16,7,11,71,6,23,71,27,12,2,79,16,21,26,1,71,3,13,0,7,16,75,79,19,15,0,68,0,6,18,2,28,68,11,6,3,15,27,68,19,0,68,2,25,1,21,22,11,9,10,72,71,24,5,20,79,3,8,6,10,0,79,16,8,79,7,8,2,1,71,6,10,19,0,68,19,7,1,71,24,11,21,3,0,73,79,85,87,79,38,18,27,68,6,3,16,15,0,17,0,7,68,19,7,1,71,24,11,21,3,0,71,24,5,20,79,9,6,11,1,71,27,12,21,0,17,0,7,68,15,6,9,75,79,16,15,10,68,16,0,22,11,11,68,3,6,0,9,72,16,71,29,1,4,0,3,9,6,30,2,79,12,14,2,68,16,7,1,9,79,12,2,79,7,6,2,1,73,79,85,86,79,33,17,10,10,71,6,10,71,7,13,20,79,11,16,1,68,11,14,10,3,79,5,9,11,68,6,2,11,9,8,68,15,6,23,71,0,19,9,79,20,2,0,20,11,10,72,71,7,1,71,24,5,20,79,10,8,27,68,6,12,7,2,31,16,2,11,74,71,94,86,71,45,17,19,79,16,8,79,5,11,3,68,16,7,11,71,13,1,11,6,1,17,10,0,71,7,13,10,79,5,9,11,68,6,12,7,2,31,16,2,11,68,15,6,9,75,79,12,2,79,3,6,25,1,71,27,12,2,79,22,14,8,12,19,79,16,8,79,6,2,12,11,10,10,68,4,7,13,11,11,22,2,1,68,8,9,68,32,0,0,73,79,85,84,79,48,15,10,29,71,14,22,2,79,22,2,13,11,21,1,69,71,59,12,14,28,68,14,28,68,9,0,16,71,14,68,23,7,29,20,6,7,6,3,68,5,6,22,19,7,68,21,10,23,18,3,16,14,1,3,71,9,22,8,2,68,15,26,9,6,1,68,23,14,23,20,6,11,9,79,11,21,79,20,11,14,10,75,79,16,15,6,23,71,29,1,5,6,22,19,7,68,4,0,9,2,28,68,1,29,11,10,79,35,8,11,74,86,91,68,52,0,68,19,7,1,71,56,11,21,11,68,5,10,7,6,2,1,71,7,17,10,14,10,71,14,10,3,79,8,14,25,1,3,79,12,2,29,1,71,0,10,71,10,5,21,27,12,71,14,9,8,1,3,71,26,23,73,79,44,2,79,19,6,28,68,1,26,8,11,79,11,1,79,17,9,9,5,14,3,13,9,8,68,11,0,18,2,79,5,9,11,68,1,14,13,19,7,2,18,3,10,2,28,23,73,79,37,9,11,68,16,10,68,15,14,18,2,79,23,2,10,10,71,7,13,20,79,3,11,0,22,30,67,68,19,7,1,71,8,8,8,29,29,71,0,2,71,27,12,2,79,11,9,3,29,71,60,11,9,79,11,1,79,16,15,10,68,33,14,16,15,10,22,73

1 个答案:

答案 0 :(得分:11)

与同一Lists相比,大量Vectors会在GC上产生更多负载。但是你的问题不在于正确选择集合,而在于Stream的错误使用。

如果使用不当,Scala的流可能是very memory inefficient。在你的情况下,我假设你试图使用Stream来避免急切地计算转换后的passwds集合,但实际上你做得更糟(因为Stream不仅记住了你的元素,它使用这些元素的Stream包装器创建了额外的开销。)

您只需将toStream替换为view即可。它将创建集合包装器,几乎所有变换都是懒惰的(基本上是你试图实现的)。

val ans = psswds.view.map(decode(3)(code))
 .map(text => (text, scoreText(text)))
 .maxBy(_._2)._1.sum

在这个微小的修复程序之后,即使使用-Xmx5m(我已选中),您的程序仍可正常运行。

在您的程序中还有许多其他要优化的东西(尽量避免创建过多的集合),但我会留给您。