删除"正则表达式重复"来自java中的ArrayList

时间:2017-06-28 13:03:14

标签: java arraylist

我想"清洁" java中的ArrayList,这里是解释

假设我们有这个清单:

a = ["a_12_b", "a_13_b", "a_13bis_b", "a_14_b", "a_14_new_b"]

在此列表中,"a_13bis_b""a_14_new_b"被视为重复,为什么?因为每个条目都有此正则表达式:a_ "a string with a lenght =2" _b

输出应为:

a = ["a_12_b", "a_13_b", "a_14_b"]

我使用了这个简单的代码,但它返回了错误的输出:

for (int j = 0; j < list.size(); j++) {
            //basically clean entry will remove the a_ and _b
            String value1= cleanEntry(list.get(j));
            for (int k = 0; k < list.size(); k++) {
                    String value2= cleanEntry(list.get(k));
                    if (k != j && value1.equalsIgnoreCase(value2)) {
                        duplicates.add(list.get(k))
                        list.remove(k);
                    }
            }
}

任何帮助?

2 个答案:

答案 0 :(得分:1)

您可以将流映射方法与正则表达式一起用于&#34;规范化&#34;将字符串转换为通用格式,然后从规范化字符串中创建一个集合。

这样的事情:

List<String> a = Arrays.asList("a_12_b", "a_13_b", "a_13bis_b", "a_14_b", "a_14_new_b");
Set<String> uniques = a.stream()
                .map(s -> s.replaceAll("^([a-z]_\\d{2})[^\\d].+(_[a-z])$", "$1$2"))
                .collect(Collectors.toSet());
System.out.println(uniques);

打印:

  

[a_14_b,a_13_b,a_12_b]

Java 7,6的解决方案:

List<String> a = Arrays.asList("a_12_b", "a_13_b", "a_13bis_b", "a_14_b", "a_14_new_b");
Set<String> set = new LinkedHashSet<>();
for(String s : a) {
    set.add(s.replaceAll("^([a-z]_\\d{2})[^\\d].+(_[a-z])$", "$1$2"));
}
System.out.println(set);

结果:

  

[a_12_b,a_13_b,a_14_b]

如果您需要2个以上的数字字符,则可以更改正则表达式。以下是结果示例:

List<String> a = Arrays.asList("a_12345678901234567890123456_b", "a_13345678901234567890123456_b",
                "a_13345678901234567890123456bis_b", "a_14345678901234567890123456_b", "a_14345678901234567890123456_new_b");
Set<String> set = new LinkedHashSet<>();
for(String s : a) {
    set.add(s.replaceAll("^([a-z]_\\d{26})[^\\d].+(_[a-z])$", "$1$2"));
}
System.out.println(set);

结果:

  

[a_12345678901234567890123456_b,a_13345678901234567890123456_b,   a_14345678901234567890123456_b]

答案 1 :(得分:0)

您可以在比较之前简单地丢弃第二个字符后面的所有字符。 试试这个..

for (int j = 0; j < list.size(); j++) {
    //basically clean entry will remove the a_ and _b
    String value1= cleanEntry(list.get(j));
    for (int k = 0; k < list.size(); k++) {
        String value2= cleanEntry(list.get(k));
        if (k != j && value1.substring(0,2).equalsIgnoreCase(value2.substring(0,2))) {
            duplicates.add(list.get(k)) list.remove(k);
        }
    } 
}