提取地图<k,multiset <v =“”>&gt;来自Java 8中的Stream of Streams

时间:2017-05-26 18:02:13

标签: java java-8 java-stream

我有流语言流(此格式不是我设置的,不能更改)。对于前

<link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet"/>
<style>
.navbar-brand img {
  max-height: 100%;
}

/* add/change this if you want to make the navbar brand area bigger */
.navbar-brand {
  height: 100px;
}
</style>
<div class="row">
  <div class="col-md-12 top navbar-fixed-top">
    <div class="col-md-6">

      <nav class="navbar navbar-default" role="navigation">
        <div class="navbar-header">

          <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
                            <span class="sr-only">Toggle navigation</span><span class="icon-bar"></span><span class="icon-bar"></span><span class="icon-bar"></span>
                        </button>
          <a class="navbar-brand" href="#"><img class="img-responsive" src="http://dev-httpwwwrestechsyscom.pantheonsite.io/sites/all/themes/myWay/logo.png"></a>
        </div>
      </nav>
    </div>

    <div class="col-md-6">
      <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
        <ul class="nav navbar-nav">
          <ul class="menu">
            <li class="first expanded dropdown active"><a href="/" title="" class="dropdown-toggle active" data-toggle="dropdown" data-target="#">About Us <span class="caret"></span></a>
              <ul class="dropdown-menu">
                <li class="first leaf active"><a href="/" title="" class="active">Mission Statement</a></li>
                <li class="leaf"><a href="http://dev-httpwwwrestechsyscom.pantheon.io/content/meet-our-team" title="">Meet Our Team</a></li>
                <li class="leaf"><a href="/content/testimonials" title="">Testimonials</a></li>
                <li class="last leaf"><a href="/profiles-football" title="">Sample Athletes Profile</a></li>
              </ul>
            </li>
            <li class="leaf"><a href="/content/eligibility-requirements">Eligibility</a></li>
            <li class="expanded dropdown"><a href="/content/recruiting-info" class="dropdown-toggle" data-toggle="dropdown" data-target="#">Recruiting <span class="caret"></span></a>
              <ul class="dropdown-menu">
                <li class="first leaf"><a href="http://dev-httpwwwrestechsyscom.pantheon.io/content/recruiting-faq" title="">Recruiting FAQ</a></li>
                <li class="last leaf"><a href="http://dev-httpwwwrestechsyscom.pantheon.io/content/recruiting-info" title="">Recruiting Info</a></li>
              </ul>
            </li>
            <li class="last leaf"><a href="/content/elite-membership">Services</a></li>
          </ul>
          <li class="active"></li>

        </ul>

      </div>

    </div>
  </div>
</div>

我正在尝试将其转换为Stream<String> doc1 = Stream.of("how", "are", "you", "doing", "doing", "doing"); Stream<String> doc2 = Stream.of("what", "what", "you", "upto"); Stream<String> doc3 = Stream.of("how", "are", "what", "how"); Stream<Stream<String>> docs = Stream.of(doc1, doc2, doc3); 的结构(或其相应的流,因为我想进一步处理它),其中键Map<String, Multiset<Integer>>是单词本身和{{1表示每个文档中出现的单词的数量(应排除0)。 Multiset是一个google guava类(不是来自java.util。)。

例如:

String

在Java 8中执行此操作的好方法是什么?

我尝试使用flatMap,但内部Stream极大地限制了我的选项。

4 个答案:

答案 0 :(得分:9)

 Map<String, List<Long>> map = docs.flatMap(
            inner -> inner.collect(
                    Collectors.groupingBy(Function.identity(), Collectors.counting()))
                    .entrySet()
                    .stream())
            .collect(Collectors.groupingBy(
                    Entry::getKey,
                    Collectors.mapping(Entry::getValue, Collectors.toList())));

System.out.println(map);

// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}

答案 1 :(得分:3)

Map<String, Multiset<Integer>> result = docs
        .map(s -> s.collect(Collectors.toCollection(HashMultiset::create)))
        .flatMap(m -> m.entrySet().stream())
        .collect(Collectors.groupingBy(Multiset.Entry::getElement,
                Collectors.mapping(Multiset.Entry::getCount,
                        Collectors.toCollection(HashMultiset::create))));

// {upto=[1], how=[1, 2], doing=[3], what=[1, 2], are=[1 x 2], you=[1 x 2]}

Multiset对于获取字数非常有用,但对于存储计数并不是必需的。如果您对Map<String, List<Integer>>感到满意,只需将最后一行替换为Collectors.toList())));

或者,既然您还在使用Guava,为什么不使用ListMultimap?

ListMultimap<String, Integer> result = docs
        .map(s -> s.collect(Collectors.toCollection(HashMultiset::create)))
        .flatMap(m -> m.entrySet().stream())
        .collect(ArrayListMultimap::create,
                (r, e) -> r.put(e.getElement(), e.getCount()),
                Multimap::putAll);

// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}

答案 2 :(得分:3)

由于您使用的是Guava,因此您可以利用其实用程序来处理流。与Table结构相同。这是代码:

Table<String, Long, Long> result =
    Streams.mapWithIndex(docs, (doc, i) -> doc.map(word -> new SimpleEntry<>(word, i)))
        .flatMap(Function.identity())
        .collect(Tables.toTable(
            Entry::getKey, Entry::getValue, p -> 1L, Long::sum, HashBasedTable::create));

这里我使用Streams.mapWithIndex方法为每个内部流分配索引。在map函数中,我将每个单词转换为由单词和索引组成的对,以便我以后可以知道该单词属于哪个文档。

然后,我将所有文档的对(word, index)平面映射到一个流,最后,我通过{{3}将所有对收集到番石榴Table收集器。行是单词,列是文档(由索引表示),值是每个文档的单词计数(我为每个不同的1L对分配(word, index)并使用{ {1}}合并冲突)。

您拥有Long::sum表格中所需的所有信息,但如果您仍然需要result,则可以这样做:

Map<String, Multiset<Integer>>

注意:你需要Guava 21才能工作。

答案 3 :(得分:1)

以下是AbacusUtil的简单解决方案:

Map<String, List<Integer>> m = Stream.of(doc1, doc2, doc3)
          .flatMap(d -> d.toMultiset().stream()).collect(Collectors.toMap2());