java-删除字符串列表中的子字符串

时间:2016-07-12 05:11:46

标签: java arraylist hashset

考虑字符串列表的情况 例如:list = ['apple','bat','cow,'dog','applebat','cowbat','dogbark','help']

java代码必须检查string的任何元素是否是另一个元素的子集,如果是,那么必须删除更大的字符串元素。

所以在这种情况下,字符串“applebat”,“cowbat”,“dogbark”将被删除。

我采取的方法是采用两个列表并按以下方式迭代它们,

ArrayList<String> list1 = new ArrayList<String>(strings);
ArrayList<String> list2 = new ArrayList<String>(strings);
for(int i = 0; i<list1.size();i++)
    {
        String curr1 = list1.get(i);

        for(int j = 0;j<list2.size();j++)
        {
            String curr2 = list2.get(j);

            if(curr2.contains(curr1)&&!curr2.equals(curr1))
            {
                list2.remove(j);
                j--;
        }
        }
    }

重要我的列表大小为200K到400K元素。我想找到一种提高性能的方法。我甚至尝试过hashsets,但是他们没有多大帮助。我正面临着该计划花费时间的问题。

任何人都可以建议对我的代码或java中的任何其他方法进行任何改进以提高性能吗?

5 个答案:

答案 0 :(得分:2)

func mapView(mapView: GMSMapView, willMove gesture: Bool) 
{
  if mapView.selectedMarker != nil
  {
    mapView.selectedMarker = nil
  }
}

答案 1 :(得分:0)

我想这里的设置会更快。 使用java8 stream api可以很容易地做到这一点。

试试:

private Set<String> delete() {
        Set<String> startSet = new HashSet<>(Arrays.asList("a", "b", "c", "d", "ab", "bc", "ce", "fg"));
        Set<String> helperSet = new HashSet<>(startSet);

        helperSet.forEach(s1 -> helperSet.forEach(s2 -> {
            if (s2.contains(s1) && !s1.equals(s2)) {
                startSet.remove(s2);
            }
        }));

        return startSet;
    }

不要删除您正在迭代的集合中的任何元素,否则您将遇到ConcurrentModificationException。

答案 2 :(得分:0)

为了充分提升大量字词的性能,我认为排序和string searching algorithm的组合,例如Aho–Corasick algorithm,就像你需要的那样,假设你愿意实现这种复杂的逻辑。

首先,按长度对单词进行排序。

然后按字长顺序构建Aho-Corasick字典。对于每个单词,首先检查字典中是否存在子字符串。如果是,请跳过该单词,否则将该单词添加到字典中。

完成后,如果字典不容易/可以转储,则转储字典或并行维护列表。

答案 3 :(得分:0)

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
import java.util.Random;

public class SubStrRmove {
    public static List<String> randomList(int size) {
        final String BASE = "abcdefghijklmnopqrstuvwxyz";
        Random random = new Random();
        List<String> list = new ArrayList<>();
        for (int i = 0; i < size; i++) {
            int length = random.nextInt(3) + 2;
            StringBuffer sb = new StringBuffer();
            for (int j = 0; j < length; j++) {
                int number = random.nextInt(BASE.length());
                sb.append(BASE.charAt(number));
            }
            list.add(sb.toString());
            sb.delete(0, sb.length());
        }
        return list;
    }

    public static List<String> removeListSubStr(List<String> args) {
        String[] input = args.toArray(new String[args.size()]);
        Arrays.parallelSort(input, (s1, s2) -> s1.length() - s2.length());
        List<String> result = new ArrayList<>(args.size());
        for (int i = 0; i < input.length; i++) {
            String temp = input[i];
            if (!result.stream().filter(s -> temp.indexOf(s) >= 0).findFirst().isPresent()) {
                result.add(input[i]);
            }
        }
        return result;
    }

    public static List<String> removeListSubStr2(List<String> args) {
        String[] input = args.toArray(new String[args.size()]);
        Arrays.parallelSort(input, (s1, s2) -> s1.length() - s2.length());
        List<String> result = new ArrayList<>(args.size());
        for (int i = 0; i < input.length; i++) {
            boolean isDiff = true;
            for (int j = 0; j < result.size(); j++) {
                if (input[i].indexOf(result.get(j)) >= 0) {
                    isDiff = false;
                    break;
                }
            }
            if (isDiff) {
                result.add(input[i]);
            }
        }
        return result;
    }

    public static void main(String[] args) throws InterruptedException {
        List<String> list = randomList(20000);
        Long start1 = new Date().getTime();
        List<String> listLambda = removeListSubStr(list);
        Long end1 = new Date().getTime();
        Long start2 = new Date().getTime();
        List<String> listFor = removeListSubStr2(list);
        Long end2 = new Date().getTime();
        System.out.println("mothod Labbda:" + (end1 - start1) + "ms");
        System.out.println("mothod simple:" + (end2 - start2) + "ms");
        System.out.println("" + listLambda.size() + listLambda);
        System.out.println("" + listFor.size() + listFor);

    }

}

答案 4 :(得分:0)

我已经在小数据上测试了它,希望它可以帮助您找到解决方案......

import java.util.ArrayList;
import java.util.Arrays;

public class Main {
    public static void main(String[] args){
        String []list = {"apple","bat","cow","dog","applebat","cowbat","dogbark","help","helpless","cows"};
        System.out.println(Arrays.toString(list));
        int prelenght = 0;
        int prolenght = 0;
        long pretime = System.nanoTime();
        for(int i=0;i<list.length;i++){
            String x = list[i];
            prelenght = list[i].length();
            for(int j=i+1;j<list.length;j++){               
                String y = list[j];
                if(y.equals(x)){
                    list[j] = "0";
                }else if(y.contains(x)||x.contains(y)){
                    prolenght = list[j].length();                   
                    if(prelenght<prolenght){
                        list[j] = "0";
                    }                       
                    if(prelenght>prolenght){
                        list[i] = "0";
                        break;
                    }
                }
            }
        }       
        long protime = System.nanoTime();
        long time = (protime - pretime);
        System.out.println(time + "ns");
        UpdateArray(list);      
    }

    public static void UpdateArray(String[] list){
        ArrayList<String> arrayList = new ArrayList<>();
        for(int i=0;i<list.length;i++){
            if(!list[i].equals("0")){
                arrayList.add(list[i]);
            }
        }
        System.out.println(arrayList.toString());
    }
}

输出:

[apple, bat, cow, dog, applebat, cowbat, dogbark, help, helpless, cows]
time elapsed : 47393ns
[apple, bat, cow, dog, help]