从List [List [String]]中删除子集的有效方法是什么?

时间:2018-05-18 22:08:59

标签: scala

我有一个List[String]的ListBuffer,val tList = ListBuffer[TCount]其中TCountcase class TCount(l: List[String], c: Long)。我想找到来自l的{​​{1}}列表,这些列表不是tList的任何其他元素的子集,且其tlist值小于其超集c值。以下程序可以工作,但我必须使用两个for循环,使代码效率低下。有没有更好的方法可以用来使代码高效?

c

4 个答案:

答案 0 :(得分:1)

受到集合评论的启发:

import scala.collection.SortedMap

class SetTrie[A](val flag: Boolean, val children: SortedMap[A, SetTrie[A]])(implicit val ord: Ordering[A]) {
  def insert(xs: List[A]): SetTrie[A] = xs match {
    case Nil => new SetTrie(true, children)
    case a :: rest => {
      val current = children.getOrElse(a, new SetTrie[A](false, SortedMap.empty))
      val inserted = current.insert(rest)
      new SetTrie(flag, children + (a -> inserted))
    }
  }

  def containsSuperset(xs: List[A], strict: Boolean): Boolean = xs match {
    case Nil => !children.isEmpty || (!strict && flag)
    case a :: rest => {
      children.get(a).map(_.containsSuperset(rest, strict)).getOrElse(false) ||
        children.takeWhile(x => ord.lt(x._1, a)).exists(_._2.containsSuperset(xs, false))
    }
  }
}

def removeSubsets[A : Ordering](xss: List[List[A]]): List[List[A]] = {
  val sorted = xss.map(_.sorted)
  val setTrie = sorted.foldLeft(new SetTrie[A](false, SortedMap.empty)) { case (st, xs) => st.insert(xs) }
  sorted.filterNot(xs => setTrie.containsSuperset(xs, true))
}

答案 1 :(得分:1)

这是一种依赖于与Set-Trie有些类似的数据结构但明确存储更多子集的方法。它提供了更差的压缩,但在查找期间更快:

def findMaximal(lists: List[List[String]]): List[List[String]] = {

  import collection.mutable.HashMap

  class Node(
    var isSubset: Boolean = false, 
    val children: HashMap[String, Node] = HashMap.empty
  ) {
    def insert(xs: List[String], isSubs: Boolean): Unit = if (xs.isEmpty) {
      isSubset |= isSubs
    } else {
      var isSubsSubs = false || isSubs
      for (h :: t <- xs.tails) {
        children.getOrElseUpdate(h, new Node()).insert(t, isSubsSubs)
        isSubsSubs = true
      }
    }
    def isMaximal(xs: List[String]): Boolean = xs match {
      case Nil => children.isEmpty && !isSubset
      case h :: t => children(h).isMaximal(t)
    }
    override def toString: String = {
      if (children.isEmpty) "#"
      else children.flatMap{ 
        case (k,v) => {
          if (v.children.isEmpty) List(k)
          else (k + ":") :: v.toString.split("\n").map("  " + _).toList
        }
      }.mkString("\n")
    }
  }

  val listsWithSorted = for (x <- lists) yield (x, x.sorted)
  val root = new Node()
  for ((x, s) <- listsWithSorted) root.insert(s, false)

  // println(root)

  for ((x, s) <- listsWithSorted; if root.isMaximal(s)) yield x
}

请注意,我允许在方法体内进行任何类型的可变废话,因为可变的trie数据结构永远不会逃避方法的范围,因此不会无意中与另一个线程共享。

以下是一组包含字符集的示例(转换为字符串列表):

println(findMaximal(List(
  "ab", "abc", "ac", "abd",
  "ade", "efd", "adf", "bafd",
  "abd", "fda", "dba", "dbe"
).map(_.toList.map(_.toString))))

输出结果为:

List(
  List(a, b, c), 
  List(a, d, e), 
  List(e, f, d), 
  List(b, a, f, d), 
  List(d, b, e)
)

确实如此,非最大元素abacabdadffdadba被消除。< / p>

这就是我不太完整的数据结构(子节点缩进):

e:
  f
b:
  e
  d:
    e
    f
  c
  f
d:
  e:
    f
  f
a:
  e
  b:
    d:
      f
    c
    f
  d:
    e
    f
  c
  f
c
f

答案 2 :(得分:0)

不确定你是否可以避免这种复杂性,但是,我想我会这样写:

val tList = List(List(1, 2, 3), List(3, 2, 1), List(9, 4, 7), List(3, 5, 6), List(1, 5, 6), List(6, 1, 5))

val tSet = tList.map(_.toSet)
def result = tSet.filterNot { sub => tSet.count(_.subsetOf(sub)) > 1 }

答案 3 :(得分:0)

这是一种方法:

  1. 创建indexed Map以识别原始列表元素
  2. 将List-elements的Map转换为Map of Sets(带索引)
  3. 生成combinations地图元素并使用自定义过滤器捕获其他人subset的元素
  4. 从集合图中删除这些subset元素,并通过索引从列表地图中检索剩余元素
  5. 示例代码:

    type TupIntSet = Tuple2[Int, Set[Int]]
    
    def subsetFilter(ls: List[TupIntSet]): List[TupIntSet] = 
      if ( ls.size != 2 ) List.empty[TupIntSet] else
        if ( ls(0)._2 subsetOf ls(1)._2 ) List[TupIntSet]((ls(0)._1, ls(0)._2)) else
          if ( ls(1)._2 subsetOf ls(0)._2 ) List[TupIntSet]((ls(1)._1, ls(1)._2)) else
            List.empty[TupIntSet]
    
    val tList = List(List(1,2), List(1,2,3), List(3,4,5), List(5,4,3), List(2,3,4), List(6,7))
    
    val listMap = (Stream from 1).zip(tList).toMap
    val setMap = listMap.map{ case (i, l) => (i, l.toSet) }
    
    val tSubsets = setMap.toList.combinations(2).toSet.flatMap(subsetFilter)
    
    val resultList = (setMap.toSet -- tSubsets).map(_._1).map(listMap.getOrElse(_, ""))
    // resultList: scala.collection.immutable.Set[java.io.Serializable] =
    //   Set(List(5, 4, 3), List(2, 3, 4), List(6, 7), List(1, 2, 3))