对于给定的Array[Double]
,例如
val a = Array.tabulate(100){ _ => Random.nextDouble * 10 }
用n
箱计算直方图的简单方法是什么?
答案 0 :(得分:7)
与@ om-nom-nom的答案非常相似的值准备,但直方图方法使用partition
非常小,
case class Distribution(nBins: Int, data: List[Double]) {
require(data.length > nBins)
val Epsilon = 0.000001
val (max,min) = (data.max,data.min)
val binWidth = (max - min) / nBins + Epsilon
val bounds = (1 to nBins).map { x => min + binWidth * x }.toList
def histo(bounds: List[Double], data: List[Double]): List[List[Double]] =
bounds match {
case h :: Nil => List(data)
case h :: t => val (l,r) = data.partition( _ < h) ; l :: histo(t,r)
}
val histogram = histo(bounds, data)
}
然后
val data = Array.tabulate(100){ _ => scala.util.Random.nextDouble * 10 }
val h = Distribution(5, data.toList).histogram
等等
val tabulated = h.map {_.size}
答案 1 :(得分:5)
这个怎么样:
val num_bins = 20
val mx = a.max.toDouble
val mn = a.min.toDouble
val hist = a
.map(x=>(((x.toDouble-mn)/(mx-mn))*num_bins).floor.toInt)
.groupBy(x=>x)
.map(x=>x._1->x._2.size)
.toSeq
.sortBy(x=>x._1)
.map(x=>x._2)
答案 2 :(得分:4)
这个怎么样?
object Hist {
type Bins = Map[Double, List[Double]]
// artificially increasing bucket length to overcome last-point issue
private val Epsilon = 0.000001
def histogram(data: List[Double], binsCount: Int) = {
require(data.length > binsCount)
val sorted = data.sorted
val min = sorted.head
val max = sorted.last
val binLength = (max - min) / binsCount + Epsilon
val bins = Map.empty[Double, List[Double]].withDefaultValue(Nil)
scatterToBins(sorted, min + binLength, binLength, bins)
}
@annotation.tailrec
private def scatterToBins(xs: List[Double], upperBound: Double, binLength: Double, bins: Bins): Bins = xs match {
case Nil => bins
case point::tail =>
val bound = if (point < upperBound) upperBound else upperBound + binLength
val currentBin = bins(bound)
val newBin = point::currentBin
scatterToBins(tail, bound, binLength, bins + (bound -> newBin))
}
// now let's test this out
val data = Array.tabulate(100){ _ => scala.util.Random.nextDouble * 10 }
val result = histogram(data.toList, 5)
val pointsPerBucket = result.values.map(xs => xs.length)
}
产生以下输出:
scala> Hist.result
// res14: Hist.Bins = Map(4.043605797342332 -> List(4.031739029821568, 3.826704675600351, 3.7661438110766166, 3.680326808626887, 3.6788463836133767, 3.5442867825350266, 3.5156167603774904, 3.464310876575163, 3.3796397333178216, 3.33851670739545, 3.1702423754536504, 3.1681320879333708, 2.9520859637868204, 2.885027245987456, 2.8091011617711024, 2.745475619527371, 2.520275275070399, 2.3720116613386546, 2.2909255324112374, 2.229522549904405, 2.0693233045454895), 6.0237846547671845 -> List(5.957572654029027, 5.6887311125180675, 5.356707271645041, 5.3155138169898475, 5.285634121992783, 5.2823949256676865, 5.159891625116016, 5.152024494453849, 5.063625430476634, 4.903706519410671, 4.891005992072018, 4.857168214245934, 4.845526801893324, 4.845452341208768, 4.8205059750156, 4.799306005256147, 4.751...
scala> Hist.pointsPerBucket
// res15: Iterable[Int] = List(21, 23, 15, 22, 19)
我使用Lists而不是Arrays有点欺骗,但我希望你没关系
答案 3 :(得分:0)
获取一些数字:
scala> val a = Array.tabulate(100){ _ => scala.util.Random.nextDouble * 10 }
a: Array[Double] = Array(6.333702962141718, 1.5506921990476974, 5.275795179538175, 1.7422259222209069, 2.8613172268857423, 9.489321343162988, 0.06102866084230496, 2.83927696305669...
使用groupBy
将元素分组为&#34; size&#34; 2。
scala> a.groupBy{
| case i if(i < 2) => "less than two"
| case i if(i >= 2 && i < 4) => "two to four"
| case i if(i >= 4 && i < 6) => "four to six"
| case i if(i >= 6 && i < 8) => "six to eight"
| case i if(i >= 8 && i <= 10) => "eight to ten"
| }
res69: scala.collection.immutable.Map[String,Array[Double]] = Map(lessThanTwo -> Array(1.5506921990476974, 1.7422259222209069...
groupBy
返回包含组的Map[String, Array[Double]]
,现在我们可以遍历每个键的值并打印出符号以获得简单的直方图。
scala> res69.map(_._2).foreach(xs => {xs.foreach(xss => print("x")); println()})
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxx
答案 4 :(得分:0)
另一个答案,我认为更简洁
def mkHistogram(n_bins: Int, lowerUpperBound: Option[(Double, Double)] = None)(xs: Seq[Double]) = {
val (mn, mx) = lowerUpperBound getOrElse(xs.min, xs.max)
val epsilon = 0.0001
val binSize = (mx - mn) / n_bins * (1 + epsilon)
val bins = (0 to n_bins).map(mn + _ * binSize).sliding(2).map(xs => (xs(0), xs(1)))
def binContains(bin:(Double,Double),x: Double) = (x >= bin._1) && (x < bin._2)
bins.map(bin => (bin, xs.count(binContains(bin,_))))
}
@ mkHistogram(5,Option(0,10))(Seq(1,1,1,1,2,2,2,3,4,5,6,7)).foreach(println)
((0.0,2.0002),7)
((2.0002,4.0004),2)
((4.0004,6.0006),2)
((6.0006,8.0008),1)
((8.0008,10.001),0)
答案 5 :(得分:0)
我有一个类似但略有不同的要求-根据用户定义的仓位/截止值制作直方图。假设,在OP的情况下,需要垃圾箱0-3,-4,-5,-6,-7,8 +。我尝试了几种方法,但是对我来说,突破是意识到我需要根据每个值放入其中的bin中的位置对数组进行分组:
unittest.TestResult class
在这种情况下的结果:
val a = Array.tabulate(100){ _ => Random.nextDouble * 10 }
val bins=List(3,4,5,6,7,8,Int.MaxValue) //-- user-defined cutoff values (with max value at the top)
a.groupBy(i => bins.indexWhere(_>i)) //-- collection of lists fitting this criteria
.map{case (i,items) => i -> items.length} //-- map for index to number of items in that index's list