分离复合词和简单词

时间:2016-05-14 20:17:16

标签: java dynamic-programming

我知道这个问题可能最适合用DP,但我想知道是否可以用递归作为一种蛮力的方式来做。

给出一组单词,比如{" sales"," person"," salesperson"},确定哪些单词是复合的(也就是说,它是列表中2个或更多单词的组合)。所以在这种情况下,销售人员=销售人员+人员,并且是复合的。

我的答案主要来自这个问题:http://www.geeksforgeeks.org/dynamic-programming-set-32-word-break-problem/

public static void main(String args[]) throws Exception {

    String[] test = { "salesperson", "sales", "person" };
    String[] output = simpleWords(test);


    for (int i = 0; i < output.length; i++)
        System.out.println(output[i]);
}

static String[] simpleWords(String[] words) {
    if (words == null || words.length == 0)
        return null;

    ArrayList<String> simpleWords = new ArrayList<String>();

    for (int i = 0; i < words.length; i++) {
        String word = words[i];
        Boolean isCompoundWord = breakWords(words, word);

        if (!isCompoundWord)
            simpleWords.add(word);
    }

    String[] retVal = new String[simpleWords.size()];
    for (int i = 0; i < simpleWords.size(); i++)
        retVal[i] = simpleWords.get(i);

    return retVal;

}

static boolean breakWords(String[] words, String word) {
    int size = word.length();

    if (size == 0 ) return true;

    for (int j = 1; j <= size; j++) {

        if (compareWords(words, word.substring(0, j)) && breakWords(words, word.substring(j, word.length()))) {
            return true;
        }
    }

    return false;
}

static boolean compareWords(String[] words, String word) {
    for (int i = 0; i < words.length; i++) {
        if (words[i].equals(word))
            return true;
    }
    return false;
}

现在的问题是,虽然它成功地将销售人员识别为复合词,但它也会将销售人员和人员识别为复合词。可以修改此代码以使此递归解决方案有效吗?我无法想出如何轻松做到这一点。

1 个答案:

答案 0 :(得分:3)

这是一个具有递归性的解决方案

public static String[] simpleWords(String[] data) {
    List<String> list = new ArrayList<>();
    for (String word : data) {
        if (!isCompound(data, word)) {
            list.add(word);
        }
    }
    return list.toArray(new String[list.size()]);
}

public static boolean isCompound(String[] data, String word) {
    return isCompound(data, word, 0);
}

public static boolean isCompound(String[] data, String word, int iteration) {
    if (data == null || word == null || word.trim().isEmpty()) {
        return false;
    }
    for (String str : data) {
        if (str.equals(word) && iteration > 0) {
            return true;
        }
        if (word.startsWith(str)) {
            String subword = word.substring(str.length());
            if (isCompound(data, subword, iteration + 1)) {
                return true;
            }
        }
    }
    return false;
}

就这样称呼它:

String[] data = {"sales", "person", "salesperson"};
System.out.println(Arrays.asList(simpleWords(data)));