Question

在Java 8中，如何通过检查每个对象的属性的清晰度来使用Stream API过滤集合？

例如，我有一个Person对象列表，我想删除同名的人，

persons.stream().distinct();

将使用Person对象的默认等式检查，因此我需要类似

的内容

persons.stream().distinct(p -> p.getName());

不幸的是distinct()方法没有这样的重载。如果不修改Person类中的等式检查，是否可以简洁地执行此操作？

Answer 1

将distinct视为有状态过滤器。这是一个函数，它返回一个谓词，该谓词维护前面所看到的状态，并返回给定元素是否第一次被看到：

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

然后你可以写：

persons.stream().filter(distinctByKey(Person::getName))

请注意，如果流是有序的并且是并行运行的，那么这将保留重复项中的任意元素，而不是第一个，distinct()。

（这与此问题my answer基本相同：Java Lambda Stream Distinct() on arbitrary key?）

Answer 2

另一种方法是使用名称作为关键字将人员放在地图中：

persons.collect(toMap(Person::getName, p -> p, (p, q) -> p)).values();

请注意，如果名称重复，保留的人将是第一个被控制的人。

Answer 3

您可以将person对象包装到另一个类中，该类仅比较人员的名称。然后，您打开包装的对象以再次获取人流。流操作可能如下所示：

persons.stream()
    .map(Wrapper::new)
    .distinct()
    .map(Wrapper::unwrap)
    ...;

班级Wrapper可能如下所示：

class Wrapper {
    private final Person person;
    public Wrapper(Person person) {
        this.person = person;
    }
    public Person unwrap() {
        return person;
    }
    public boolean equals(Object other) {
        if (other instanceof Wrapper) {
            return ((Wrapper) other).person.getName().equals(person.getName());
        } else {
            return false;
        }
    }
    public int hashCode() {
        return person.getName().hashCode();
    }
}

Answer 4

使用Set的另一种解决方案。可能不是理想的解决方案，但它有效

Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());

或者，如果您可以修改原始列表，则可以使用removeIf方法

persons.removeIf(p -> !set.add(p.getName()));

Answer 5

使用带有自定义比较器的TreeSet有一种更简单的方法。

persons.stream()
    .collect(Collectors.toCollection(
      () -> new TreeSet<Person>((p1, p2) -> p1.getName().compareTo(p2.getName())) 
));

Answer 6

我们还可以使用RxJava（非常强大的reactive extension库）

PasswordField.Text

或

Observable.from(persons).distinct(Person::getName)

Answer 7

您可以使用Eclipse Collections中的distinct(HashingStrategy)方法。

List<Person> persons = ...;
MutableList<Person> distinct =
    ListIterate.distinct(persons, HashingStrategies.fromFunction(Person::getName));

如果您可以重构persons来实现Eclipse Collections接口，则可以直接在列表中调用该方法。

MutableList<Person> persons = ...;
MutableList<Person> distinct =
    persons.distinct(HashingStrategies.fromFunction(Person::getName));

HashingStrategy只是一个策略接口，允许您定义equals和hashcode的自定义实现。

public interface HashingStrategy<E>
{
    int computeHashCode(E object);
    boolean equals(E object1, E object2);
}

注意：我是Eclipse Collections的提交者。

Answer 8

您可以使用StreamEx库：

StreamEx.of(persons)
        .distinct(Person::getName)
        .toList()

Answer 9

如果可以的话，我建议使用Vavr。使用此库，您可以执行以下操作：

io.vavr.collection.List.ofAll(persons)
                       .distinctBy(Person::getName)
                       .toJavaSet() // or any another Java 8 Collection

Answer 10

扩展Stuart Marks的答案，这可以用更短的方式完成，没有并发地图（如果你不需要并行流）：

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    final Set<Object> seen = new HashSet<>();
    return t -> seen.add(keyExtractor.apply(t));
}

然后致电：

persons.stream().filter(distinctByKey(p -> p.getName());

Answer 11

您可以使用groupingBy收藏家：

persons.collect(Collectors.groupingBy(p -> p.getName())).values().forEach(t -> System.out.println(t.get(0).getId()));

如果您想要另一个流，可以使用它：

persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream().map(l -> (l.get(0)));

Answer 12

我制作了通用版本：

private <T, R> Collector<T, ?, Stream<T>> distinctByKey(Function<T, R> keyExtractor) {
    return Collectors.collectingAndThen(
            toMap(
                    keyExtractor,
                    t -> t,
                    (t1, t2) -> t1
            ),
            (Map<R, T> map) -> map.values().stream()
    );
}

一个例子：

Stream.of(new Person("Jean"), 
          new Person("Jean"),
          new Person("Paul")
)
    .filter(...)
    .collect(distinctByKey(Person::getName)) // return a stream of Person with 2 elements, jean and Paul
    .map(...)
    .collect(toList())

Answer 13

Saeed Zarinfam使用的类似方法，但更多Java 8风格：）

persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream()
 .map(plans -> plans.stream().findFirst().get())
 .collect(toList());

Answer 14

另一个支持此功能的库是jOOλ及其Seq.distinct(Function<T,U>)方法：

Seq.seq(persons).distinct(Person::getName).toList();

Under the hood，实际上它的作用与accepted answer相同。

Answer 15

Set<YourPropertyType> set = new HashSet<>();
list
        .stream()
        .filter(it -> set.add(it.getYourProperty()))
        .forEach(it -> ...);

Answer 16

实现这一点的最简单方法是跳转排序功能，因为它已经提供了一个可选的Comparator，可以使用元素的属性创建。然后你必须过滤重复项，这可以使用statefull Predicate来完成，它使用的事实是，对于排序流，所有相等的元素都是相邻的：

Comparator<Person> c=Comparator.comparing(Person::getName);
stream.sorted(c).filter(new Predicate<Person>() {
    Person previous;
    public boolean test(Person p) {
      if(previous!=null && c.compare(previous, p)==0)
        return false;
      previous=p;
      return true;
    }
})./* more stream operations here */;

当然，状态良好Predicate不是线程安全的，但是如果您需要，可以将此逻辑移动到Collector并让流在使用时保护线程安全Collector。这取决于你想要对你在问题中没有告诉我们的不同元素流做什么。

Answer 17

我的处理方法是将所有具有相同属性的对象组合在一起，然后将组切成1个大小，最后将它们收集为List。

  List<YourPersonClass> listWithDistinctPersons =   persons.stream()
            //operators to remove duplicates based on person name
            .collect(Collectors.groupingBy(p -> p.getName()))
            .values()
            .stream()
            //cut short the groups to size of 1
            .flatMap(group -> group.stream().limit(1))
            //collect distinct users as list
            .collect(Collectors.toList());

Answer 18

我想改进 Stuart Marks answer。如果键为空，它会通过 NullPointerException。在这里，我通过添加一个检查为 keyExtractor.apply(t)!=null 来忽略空键。

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> keyExtractor.apply(t)!=null && seen.add(keyExtractor.apply(t));

}

Answer 19

这个解决方案是什么。

只有当您的密钥实现 Equal 时它才会起作用，大多数基本类型都这样做，但它更简单一些。

person.stream().map(person -> p.getName()).distinct()

Answer 20

晚了聚会，但有时我会用这种单线的方式：

((Function<Value, Key>) Value::getKey).andThen(new HashSet<>()::add)::apply

表达式是Predicate<Value>，但是由于映射是内联的，因此它可以用作过滤器。当然，这种方法的可读性较差，但有时可以避免这种方法。

Answer 21

处理null的{{3}}的变体形式：

    public static <T, K> Predicate<T> distinctBy(final Function<? super T, K> getKey) {
        val seen = ConcurrentHashMap.<Optional<K>>newKeySet();
        return obj -> seen.add(Optional.ofNullable(getKey.apply(obj)));
    }

在我的测试中：

        assertEquals(
                asList("a", "bb"),
                Stream.of("a", "b", "bb", "aa").filter(distinctBy(String::length)).collect(toList()));

        assertEquals(
                asList(5, null, 2, 3),
                Stream.of(5, null, 2, null, 3, 3, 2).filter(distinctBy(x -> x)).collect(toList()));

        val maps = asList(
                hashMapWith(0, 2),
                hashMapWith(1, 2),
                hashMapWith(2, null),
                hashMapWith(3, 1),
                hashMapWith(4, null),
                hashMapWith(5, 2));

        assertEquals(
                asList(0, 2, 3),
                maps.stream()
                        .filter(distinctBy(m -> m.get("val")))
                        .map(m -> m.get("i"))
                        .collect(toList()));

Answer 22

虽然最高的答案是Java 8的绝对最佳答案，但就性能而言却是绝对最差的。如果您真的想要一个性能低下的应用程序，请继续使用它。仅通过“ For-Each”和“ Set”即可实现提取唯一的个人名称集的简单要求。如果列表的大小超过10，情况会变得更糟。

考虑您有20个对象的集合，例如：

public static final List<SimpleEvent> testList = Arrays.asList(
            new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
            new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
            new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
            new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
            new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));

对象SimpleEvent的位置如下所示：

public class SimpleEvent {

private String name;
private String type;

public SimpleEvent(String name) {
    this.name = name;
    this.type = "type_"+name;
}

public String getName() {
    return name;
}

public void setName(String name) {
    this.name = name;
}

public String getType() {
    return type;
}

public void setType(String type) {
    this.type = type;
}
}

要进行测试，您具有这样的 JMH 代码（（请注意，我使用接受的答案中提到的相同的 distinctByKey 谓词）：

@Benchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{

    Set<String> uniqueNames = testList
            .stream()
            .filter(distinctByKey(SimpleEvent::getName))
            .map(SimpleEvent::getName)
            .collect(Collectors.toSet());
    blackhole.consume(uniqueNames);
}

@Benchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
    Set<String> uniqueNames = new HashSet<>();

    for (SimpleEvent event : testList) {
        uniqueNames.add(event.getName());
    }
    blackhole.consume(uniqueNames);
}

public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder()
            .include(MyBenchmark.class.getSimpleName())
            .forks(1)
            .mode(Mode.Throughput)
            .warmupBatchSize(3)
            .warmupIterations(3)
            .measurementIterations(3)
            .build();

    new Runner(opt).run();
}

然后您将获得基准测试，如下所示：

Benchmark                                  Mode  Samples        Score  Score error  Units
c.s.MyBenchmark.aForEachBasedUniqueSet    thrpt        3  2635199.952  1663320.718  ops/s
c.s.MyBenchmark.aStreamBasedUniqueSet     thrpt        3   729134.695   895825.697  ops/s

如您所见，与Java 8 Stream相比，简单的 For-Each 吞吐量提高了3倍，错误得分也降低了。

更高 吞吐量，更好性能

Answer 23

我遇到了一种情况，当时我想根据2个键从列表中获得不同的元素。如果您想基于两个键来区分或可能使用复合键，请尝试

class Person{
    int rollno;
    String name;
}
List<Person> personList;


Function<Person, List<Object>> compositeKey = personList->
        Arrays.<Object>asList(personList.getName(), personList.getRollno());

Map<Object, List<Person>> map = personList.stream().collect(Collectors.groupingBy(compositeKey, Collectors.toList()));

List<Object> duplicateEntrys = map.entrySet().stream()`enter code here`
        .filter(settingMap ->
                settingMap.getValue().size() > 1)
        .collect(Collectors.toList());

Answer 24

不同的对象列表可以使用：

 List distinctPersons = persons.stream()
                    .collect(Collectors.collectingAndThen(
                            Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Person:: getName))),
                            ArrayList::new));

Answer 25

在@ josketres的回答基础上，我创建了一个通用的实用方法：

通过创建Collector。

，您可以使Java 8更加友好

public static <T> Set<T> removeDuplicates(Collection<T> input, Comparator<T> comparer) {
    return input.stream()
            .collect(toCollection(() -> new TreeSet<>(comparer)));
}


@Test
public void removeDuplicatesWithDuplicates() {
    ArrayList<C> input = new ArrayList<>();
    Collections.addAll(input, new C(7), new C(42), new C(42));
    Collection<C> result = removeDuplicates(input, (c1, c2) -> Integer.compare(c1.value, c2.value));
    assertEquals(2, result.size());
    assertTrue(result.stream().anyMatch(c -> c.value == 7));
    assertTrue(result.stream().anyMatch(c -> c.value == 42));
}

@Test
public void removeDuplicatesWithoutDuplicates() {
    ArrayList<C> input = new ArrayList<>();
    Collections.addAll(input, new C(1), new C(2), new C(3));
    Collection<C> result = removeDuplicates(input, (t1, t2) -> Integer.compare(t1.value, t2.value));
    assertEquals(3, result.size());
    assertTrue(result.stream().anyMatch(c -> c.value == 1));
    assertTrue(result.stream().anyMatch(c -> c.value == 2));
    assertTrue(result.stream().anyMatch(c -> c.value == 3));
}

private class C {
    public final int value;

    private C(int value) {
        this.value = value;
    }
}

Answer 26

我在此清单中的解决方案：

#include <iostream>
#define BOOST_DATE_TIME_NO_LIB
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <string>

namespace IP = boost::interprocess;
template <typename T> using MyVectorT = IP::vector < T, IP::allocator<T, IP::managed_mapped_file::segment_manager>>;

int main(int )
{
    const char *FileName = "file.bin";
    const std::size_t FileSize = 1000;
    using MyVector = MyVectorT<size_t>;
    try
    {
        IP::remove_file_on_destroy tmp{ FileName };
        IP::managed_mapped_file segment(IP::create_only
            , FileName      //Mapped file name
            , FileSize);    //Mapped file initial size

        MyVector *myvector = segment.construct<MyVector>("MyVector")(segment.get_segment_manager());
        for (size_t i = 0; i < 100000; ++i)  //Insert data in the vector
        {
            bool push_failure = true;
            do
            {
                try
                {
                    myvector->push_back(i);
                    push_failure = false;       //success of push_back
                }
                catch (const IP::bad_alloc &)   //memory mapped file is too small for vector
                {
                    const size_t grow_size = std::max<size_t>(FileSize, 2 * (myvector->size() + 1) * sizeof(MyVector::value_type));   //estimate memory for new vector capacity
                    std::cout << "segment size = " << segment.get_size() << " Vector capacity = " << myvector->capacity() << " grow_size = " << grow_size;
                    //free memory mapped file
                    segment.flush();    
                    segment.~basic_managed_mapped_file();
                    IP::managed_mapped_file::grow(FileName, grow_size);
                    new (&segment) IP::managed_mapped_file(IP::open_only, FileName);
                    std::cout << " -> new segment size = " << segment.get_size() << std::endl;
                    myvector = segment.find<MyVector>("MyVector").first;
                    push_failure = true;        //try push_back again!!!
                }
            } while (push_failure);
        }
        std::cout << "Vector size =" << myvector->size() << "\n";
        for (size_t i = 0; i < 100000; ++i)
        {
            if ((*myvector)[i] != i)
            {
                std::cout << "vector error!!! i = " << i << " vector[i] = " << (*myvector)[i] << std::endl;
            }
        }
    }
    catch (const std::exception &e)
    {
        std::cout << "Error " << e.what() << std::endl;
    }
    catch (...)
    {
        std::cout << "Error";
    }
    return 0;
}

在我的情况下，我想找到不同的值并将其放在列表中。

Answer 27

就我而言，我需要控制上一个元素是什么。然后，我创建了一个有状态谓词，在该谓词中，我可以控制前一个元素是否不同于当前元素。

public List<Log> fetchLogById(Long id) {
    return this.findLogById(id).stream()
        .filter(new LogPredicate())
        .collect(Collectors.toList());
}

public class LogPredicate implements Predicate<Log> {

    private Log previous;

    public boolean test(Log atual) {
        boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);

        if (isDifferent) {
            previous = current;
        }
        return isDifferent;
    }

    private boolean verifyIfDifferentLog(Log current, Log previous) {
        return !current.getId().equals(previous.getId());
    }

}

Answer 28

Here is the example
public class PayRoll {

    private int payRollId;
    private int id;
    private String name;
    private String dept;
    private int salary;


    public PayRoll(int payRollId, int id, String name, String dept, int salary) {
        super();
        this.payRollId = payRollId;
        this.id = id;
        this.name = name;
        this.dept = dept;
        this.salary = salary;
    }
} 

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collector;
import java.util.stream.Collectors;

public class Prac {
    public static void main(String[] args) {

        int salary=70000;
        PayRoll payRoll=new PayRoll(1311, 1, "A", "HR", salary);
        PayRoll payRoll2=new PayRoll(1411, 2    , "B", "Technical", salary);
        PayRoll payRoll3=new PayRoll(1511, 1, "C", "HR", salary);
        PayRoll payRoll4=new PayRoll(1611, 1, "D", "Technical", salary);
        PayRoll payRoll5=new PayRoll(711, 3,"E", "Technical", salary);
        PayRoll payRoll6=new PayRoll(1811, 3, "F", "Technical", salary);
        List<PayRoll>list=new ArrayList<PayRoll>();
        list.add(payRoll);
        list.add(payRoll2);
        list.add(payRoll3);
        list.add(payRoll4);
        list.add(payRoll5);
        list.add(payRoll6);


        Map<Object, Optional<PayRoll>> k = list.stream().collect(Collectors.groupingBy(p->p.getId()+"|"+p.getDept(),Collectors.maxBy(Comparator.comparingInt(PayRoll::getPayRollId))));


        k.entrySet().forEach(p->
        {
            if(p.getValue().isPresent())
            {
                System.out.println(p.getValue().get());
            }
        });



    }
}

Output:

PayRoll [payRollId=1611, id=1, name=D, dept=Technical, salary=70000]
PayRoll [payRollId=1811, id=3, name=F, dept=Technical, salary=70000]
PayRoll [payRollId=1411, id=2, name=B, dept=Technical, salary=70000]
PayRoll [payRollId=1511, id=1, name=C, dept=HR, salary=70000]

Answer 29

也许对某人有用。我还有一点要求。拥有来自第三方的对象列表A将删除所有具有相同A.b字段的A.id字段（列表中具有相同A的多个A.id对象）。 Stream partition Tagir Valeev的回答激发了我使用返回Collector的自定义Map<A.id, List<A>>。简单的flatMap将完成其余的工作。

 public static <T, K, K2> Collector<T, ?, Map<K, List<T>>> groupingDistinctBy(Function<T, K> keyFunction, Function<T, K2> distinctFunction) {
    return groupingBy(keyFunction, Collector.of((Supplier<Map<K2, T>>) HashMap::new,
            (map, error) -> map.putIfAbsent(distinctFunction.apply(error), error),
            (left, right) -> {
                left.putAll(right);
                return left;
            }, map -> new ArrayList<>(map.values()),
            Collector.Characteristics.UNORDERED)); }

Answer 30

由于每个人都在分享他们自己的想法和实施，我也有一个，它不是一个有效的方法，但它是有效的：

Set<String> personNameList = personList.stream().
map(tempPerson->tempPerson.getName()).collect(Collectors.toSet());

personList.stream().
                   collect(()->new ArrayList<Person>(),
                           (l1,p)->{
                                  if(!personNameList.contains(p.getName())) {
                                        l1.add(p);
                                  }
        }, ArrayList::addAll);

Answer 31

也可以使用以下两种方法找到不同或唯一的列表。

方法1：使用distinct

yourObjectName.stream().map(x->x.yourObjectProperty).distinct.collect(Collectors.toList());

方法2：使用HashSet

Set<E> set = new HashSet<>();
set.addAll(yourObjectName.stream().map(x->x.yourObjectProperty).collect(Collectors.toList()));

Answer 32

您可以编写最简单的代码：

    persons.stream().map(x-> x.getName()).distinct().collect(Collectors.toList());

Java 8属性不同

32 个答案: