使用java8 Streams合并列表

时间:2017-02-10 22:25:21

标签: java-8 java-stream

我想使用java8流合并内部列表,如下所示:

何时

List<List<Integer>> mainList =  new ArrayList<List<Integer>>();
        mainList.add(Arrays.asList(0,1));
        mainList.add(Arrays.asList(0,1,2));
        mainList.add(Arrays.asList(1,2));
        mainList.add(Arrays.asList(3));

应该合并到

  [[0,1,2],[3]];       

List<List<Integer>> mainList =  new ArrayList<List<Integer>>();
        mainList.add(Arrays.asList(0,2));
        mainList.add(Arrays.asList(1,4));
        mainList.add(Arrays.asList(0,2,4));
        mainList.add(Arrays.asList(3,4));      
        mainList.add(Arrays.asList(1,3,4));

应该合并到

 [[0,1,2,3,4]];                

这是迄今为止我所做的

static void mergeCollections(List<List<Integer>> collectionTomerge) {
    boolean isMerge = false;
    List<List<Integer>> mergeCollection = new ArrayList<List<Integer>>();

    for (List<Integer> listInner : collectionTomerge) {
        List<Integer> mergeAny = mergeCollection.stream().map(
                lc -> lc.stream().filter(listInner::contains)
        ).findFirst()
                .orElse(null)
                .collect(Collectors.toList());
    }
}

但我得到了这个例外:

Exception in thread "main" java.lang.NullPointerException
at linqArraysOperations.LinqOperations.mergeCollections(LinqOperations.java:87)

更新了我的答案版本

这就是我想要实现的目标,但Tagir的答案是没有递归

我通过使用Tagir回答没有平面地图的逻辑来改变Mikhaal的一些答案来实现这一目标

public static <T> List<List<T>> combineList(List<List<T>> argList) {
       boolean isMerge = false;
       List<List<T>> result = new ArrayList<>();

       for (List<T> list : argList) {
                                List<List<T>> mergedFound =
                                        result.stream()
                                        .filter(mt->list.stream().anyMatch(mt::contains))
                                        .map(
                                              t ->  Stream.concat(t.stream(),list.stream()).distinct()
                                              .collect(Collectors.toList())
                                             )
                                       .collect(Collectors.toList());

                //if(mergedFound !=null && ( mergedFound.size() > 0 &&  mergedFound.stream().findFirst().get().size() > 0 )){
        if(mergedFound !=null &&  mergedFound.size() > 0 && ){
                   result = Stream.concat(result.stream().filter(t->list.stream().noneMatch(t::contains)),mergedFound.stream()).distinct().collect(Collectors.toList());
                   isMerge = true;
                }
                else
                    result.add(list);

       }
       if(isMerge && result.size() > 1)
          return  combineList(result);
        return result;
    }

2 个答案:

答案 0 :(得分:5)

这是一个非常简单但不是非常有效的解决方案:

static List<List<Integer>> mergeCollections(List<List<Integer>> input) {
    List<List<Integer>> result = Collections.emptyList();

    for (List<Integer> listInner : input) {
        List<Integer> merged = Stream.concat(
                // read current results and select only those which contain
                // numbers from current list
                result.stream()
                      .filter(list -> list.stream().anyMatch(listInner::contains))
                      // flatten them into single stream
                      .flatMap(List::stream),
                // concatenate current list, remove repeating numbers and collect
                listInner.stream()).distinct().collect(Collectors.toList());

        // Now we need to remove used lists from the result and add the newly created 
        // merged list
        result = Stream.concat(
                result.stream()
                      // filter out used lists
                      .filter(list -> list.stream().noneMatch(merged::contains)),
                Stream.of(merged)).collect(Collectors.toList());
    }
    return result;
}

棘手的部分是下一个listInner可以合并已经添加的几个列表。例如,如果我们有部分结果,例如[[1, 2], [4, 5], [7, 8]],并处理新的listInner内容为[2, 3, 5, 7],那么部分结果应该变为[[1, 2, 3, 4, 5, 7, 8]](即所有列表都是合并在一起)。因此,在每次迭代中,我们都在查找现有的部分结果,这些部分结果与当前listInner具有相同的数字,将它们展平,与当前listInner连接并转储到新的merged列表中。接下来,我们会从merged中使用的当前结果列表中过滤出来,并在那里添加merged

使用partitioningBy收集器可以立即执行两个过滤步骤,使解决方案更有效:

static List<List<Integer>> mergeCollections(List<List<Integer>> input) {
    List<List<Integer>> result = Collections.emptyList();

    for (List<Integer> listInner : input) {
        // partition current results by condition: whether they contain
        // numbers from listInner
        Map<Boolean, List<List<Integer>>> map = result.stream().collect(
                Collectors.partitioningBy(
                        list -> list.stream().anyMatch(listInner::contains)));

        // now map.get(true) contains lists which intersect with current
        //    and should be merged with current
        // and map.get(false) contains other lists which should be preserved 
        //    in result as is
        List<Integer> merged = Stream.concat(
                map.get(true).stream().flatMap(List::stream),
                listInner.stream()).distinct().collect(Collectors.toList());
        result = Stream.concat(map.get(false).stream(), Stream.of(merged))
                       .collect(Collectors.toList());
    }
    return result;
}

此处map.get(true)包含的列表包含listInnermap.get(false)中的元素,其中包含应从之前结果中保留的其他列表。

元素的顺序可能不是您所期望的,但您可以轻松地对嵌套列表进行排序,或者根据需要使用List<TreeSet<Integer>>作为结果数据结构。

答案 1 :(得分:1)

对于您获得的例外情况,我猜我传递#coding:utf-8 import multiprocessing import requests import bs4 import re import string root_url = 'http://www.haoshiwen.org' #index_url = root_url+'/type.php?c=1' def xianqin_url(): f = 0 h = 0 x = 0 y = 0 b = [] l=[] for i in range(1,64):#页数 index_url=root_url+'/type.php?c=1'+'&page='+"%s" % i response = requests.get(index_url) soup = bs4.BeautifulSoup(response.text,"html.parser") x = [a.attrs.get('href') for a in soup.select('div.sons a[href^=/]')]#取出每一页的div是sons的链接 c=len(x)#一共c个链接 j=0 for j in range(c): url = root_url+x[j] us = str(url) print "收集到%s" % us l.append(url) #pool = multiprocessing.Pool(8) return l def feng (url) : response = requests.get(url) response.encoding='utf-8' #print response.text soup = bs4.BeautifulSoup(response.text, "html.parser") #content = soup.select('div.shileft') qq=str(soup) soupout = re.findall(r"原文(.+?)</div>",qq,re.S)#以“原文”开头<div>结尾的字段 #print soupout[1] content=str(soupout[1]) b="风" cc=content.count(b,0,len(content)) return cc def start_process(): print 'Starting',multiprocessing.current_process().name def feng (url) : response = requests.get(url) response.encoding='utf-8' #print response.text soup = bs4.BeautifulSoup(response.text, "html.parser") #content = soup.select('div.shileft') qq=str(soup) soupout = re.findall(r"原文(.+?)</div>",qq,re.S)#以“原文”开头<div>结尾的字段 #print soupout[1] content=str(soupout[1]) b="风" c="花" d="雪" e="月" f=content.count(b,0,len(content)) h=content.count(c,0,len(content)) x=content.count(d,0,len(content)) y=content.count(e,0,len(content)) return f,h,x,y def find(urls): r= [0,0,0,0] pool=multiprocessing.Pool() res=pool.map4(feng,urls) for i in range(len(res)): r=map(lambda (a,b):a+b, zip(r,res[i])) return r if __name__=="__main__": print "开始收集网址" qurls=xianqin_url() print "收集到%s个链接" % len(qurls) print "开始匹配先秦诗文" find(qurls) print ''' %s篇先秦文章中: --------------------------- 风有:%s 花有:%s 雪有:%s 月有:%s 数据来源:%s ''' % (len(qurls),find(qurls)[0],find(qurls)[1],find(qurls)[2],find(qurls)[3],root_url) 的{​​{1}}包含List<List<Integer>>值,并且那些投掷mergeCollectionsnull

其次,如果我正确理解您的问题,您需要一种可以合并共享共同元素的列表的算法。我想出了解决问题的方法:

NullPointerException

该算法相当简单,它使用简单的递归来运行算法,直到输出为&#34; clean&#34;,即直到列表完全合并为止。我还没有做过任何优化,但它确实做了它应该做的事情。

请注意,此方法还会合并您现有的列表。