给定一组单词,如何识别“n”组字母,以帮助您从原始列表中获得最大完整单词数?

时间:2015-07-20 18:36:38

标签: string algorithm graph

示例:

n = 9
words = {Bike, Tire, Fuel, Biker, Filter, Trike}
output = {B,T,I,K,E,F,U,L,R}

(输出顺序并不重要。重要的是要注意的是,给定像FOO这样的单词,不能使用F,O作为字母表,但总是需要F,O,O。类似的字母表分别处理)

解决此问题的最有效算法是什么? 我正在考虑使用每个角色的频率,但这似乎没什么帮助。

5 个答案:

答案 0 :(得分:3)

  

编辑:这是针对编辑过的问题更新的。有关详细信息,请参阅revision history

根据评论,人们必须假设(或至少考虑可能性)这实际上是NP完全问题。所以,除非有人证明或反对这个问题的实际复杂性,否则这是一个蛮力的解决方案,至少应该计算出正确的输出。

  

EDIT 2.0:作为shapiro.yaacov pointed out in his answer,确实是NP完全的

它使用一些utility class来计算所有单词的初始集合中特定数量字母的所有组合。由于n^k个字母k组合(给定n字母的初始集合),这显然不是"有效的"在多项式时间解决方案的意义上 - 但目前尚不清楚这种解决方案是否存在。

为了根据编辑问题中提到的点来验证输出(即,字母必须像在单词中出现的那样经常出现在结果列表中),我使用了一个带有字母的示例输入字母重复:

"BIKE", "BIKER", "TRIKE", "BEER", DEER", "SEED", "FEED"

对于此输入,程序打印

0 letters: [], created words: []
1 letters: [B], created words: []
2 letters: [B, B], created words: []
3 letters: [B, B, B], created words: []
4 letters: [B, E, E, R], created words: [BEER]
5 letters: [B, D, E, E, R], created words: [BEER, DEER]
6 letters: [B, D, E, E, F, R], created words: [BEER, DEER, FEED]
7 letters: [B, D, E, E, F, R, S], created words: [BEER, DEER, SEED, FEED]
8 letters: [B, D, E, E, F, I, K, R], created words: [BIKE, BIKER, BEER, DEER, FEED]

也许它可以被视为有用,可能作为其他人的起点或构建块。

import java.math.BigInteger;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.NoSuchElementException;
import java.util.Set;
import java.util.TreeSet;

public class MaximizeWords
{
    public static void main(String[] args)
    {
        List<String> words = Arrays.asList(
            "BIKE",
            "BIKER",
            "TRIKE",

            "BEER",
            "DEER",
            "SEED",
            "FEED"
        );

        List<Character> allLetters = 
            new ArrayList<Character>(allLettersOf(words));
        for (int n=0; n<=8; n++)
        {
            CombinationIterable<Character> combinations =
                new CombinationIterable<Character>(n, allLetters);

            List<Solution> solutions = new ArrayList<Solution>();
            for (List<Character> combination : combinations)
            {
                Collections.sort(combination);
                Solution solution = new Solution(words, combination);
                solutions.add(solution);
            }
            Solution bestSolution = Collections.max(solutions, 
                new Comparator<Solution>()
            {
                @Override
                public int compare(Solution s0, Solution s1)
                {
                    return Integer.compare(
                        s0.createdWords.size(), s1.createdWords.size());
                }
            });
            System.out.println(bestSolution);
        }
    }

    static class Solution
    {
        List<Character> letters;
        List<String> createdWords;

        public Solution(List<String> words, List<Character> letters)
        {
            this.letters = letters;
            this.createdWords = computeCreatedWords(words, letters);
        }

        @Override
        public String toString()
        {
            return letters.size() + " letters: " + letters
                + ", created words: " + createdWords;
        }
    }

    private static List<String> computeCreatedWords(
        List<String> words, List<Character> letters)
    {
        List<String> createdWords = new ArrayList<String>();
        for (String word : words)
        {
            if (creates(letters, word))
            {
                createdWords.add(word);
            }
        }
        return createdWords;
    }

    private static boolean creates(List<Character> letters, String word)
    {
        List<Character> copy = new ArrayList<Character>(letters);
        for (int i=0; i<word.length(); i++)
        {
            Character c = Character.valueOf(word.charAt(i));
            if (!copy.remove(c))
            {
                return false;
            }
        }
        return true;
    }


    private static List<Character> lettersOf(String word)
    {
        List<Character> letters = new ArrayList<Character>();
        for (int i=0; i<word.length(); i++)
        {
            letters.add(Character.valueOf(word.charAt(i)));
        }
        return letters;
    }

    private static Set<Character> allLettersOf(Iterable<String> words)
    {
        Set<Character> letters = new TreeSet<Character>();
        for (String word : words)
        {
            letters.addAll(lettersOf(word));
        }
        return letters;
    }
}







//=============================================================================
// These classes are taken from https://github.com/javagl/Combinatorics


/**
 * A class providing an iterator over all combinations of a certain number
 * of elements of a given set. For a set S with n = |S|, there are are n^k 
 * combinations of k elements of the set. This is the number of possible
 * samples when doing sampling with replacement. Example:<br />
 * <pre>
 * S = { A,B,C }, n = |S| = 3
 * k = 2 
 * m = n^k = 9
 * 
 * Combinations:
 * [A, A]
 * [A, B]
 * [A, C]
 * [B, A]
 * [B, B]
 * [B, C]
 * [C, A]
 * [C, B]
 * [C, C]
 * </pre>
 *  
 * @param <T> The type of the elements
 */
final class CombinationIterable<T> implements Iterable<List<T>>
{
    /**
     * The input elements
     */
    private final List<T> input;

    /**
     * The sample size
     */
    private final int sampleSize;

    /**
     * The total number of elements that the iterator will provide
     */
    private final int numElements;

    /**
     * Creates an iterable over all multisets of 
     * 'sampleSize' elements of the given array.
     *  
     * @param sampleSize The sample size
     * @param input The input elements
     */
    public CombinationIterable(int sampleSize, List<T> input)
    {
        this.sampleSize = sampleSize;
        this.input = input;
        numElements = (int) Math.pow(input.size(), sampleSize);
    }

    @Override
    public Iterator<List<T>> iterator()
    {
        return new Iterator<List<T>>()
        {
            /**
             * The element counter
             */
            private int current = 0;

            /**
             * The indices of the elements that are currently chosen
             */
            private final int chosen[] = new int[sampleSize];

            @Override
            public boolean hasNext()
            {
                return current < numElements;
            }

            @Override
            public List<T> next()
            {
                if (!hasNext())
                {
                    throw new NoSuchElementException("No more elements");
                }

                List<T> result = new ArrayList<T>(sampleSize);
                for (int i = 0; i < sampleSize; i++)
                {
                    result.add(input.get(chosen[i]));
                }
                increase();
                current++;
                return result;
            }

            /**
             * Increases the k-ary representation of the selection of 
             * elements by one.
             */
            private void increase()
            {
                // The array of 'chosen' elements for a set of size n 
                // effectively is a number represented in k-ary form, 
                // and thus, this method does nothing else than count. 
                // For example, when choosing 2 elements of a set with 
                // n=10, the contents of 'chosen' would represent all
                // values 
                // 00, 01, 02,... 09,
                // 10, 11, 12,... 19,
                // ...
                // 90, 91, 92, ...99
                // with each digit indicating the index of the element
                // of the input array that should be placed at the
                // respective position of the output array.
                int index = chosen.length - 1;
                while (index >= 0)
                {
                    if (chosen[index] < input.size() - 1)
                    {
                        chosen[index]++;
                        return;
                    }
                    chosen[index] = 0;
                    index--;
                }
            }

            @Override
            public void remove()
            {
                throw new UnsupportedOperationException(
                    "May not remove elements from a combination");
            }
        };
    }
}

/**
 * Utility methods used in the combinatorics package
 */
class Utils
{
    /**
     * Utility method for computing the factorial n! of a number n.
     * The factorial of a number n is n*(n-1)*(n-2)*...*1, or more
     * formally:<br />
     * 0! = 1 <br />
     * 1! = 1 <br />
     * n! = n*(n-1)!<br />
     *
     * @param n The number of which the factorial should be computed
     * @return The factorial, i.e. n!
     */
    public static BigInteger factorial(int n)
    {
        BigInteger f = BigInteger.ONE;
        for (int i = 2; i <= n; i++)
        {
            f = f.multiply(BigInteger.valueOf(i));
        }
        return f;
    }    
    /**
     * A magic utility method that happens to return the number of
     * bits that are set to '1' in the given number.
     *  
     * @param n The number whose bits should be counted
     * @return The number of bits that are '1' in n
     */
    public static int countBits(int n)
    {
        int m = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
        return ((m + (m >> 3)) & 030707070707) % 63;
    }

    /**
     * Add all elements from the given iterable into the given collection
     * 
     * @param <T> A type that is related to the elements 
     * @param iterable The iterable
     * @param collection The collection
     */
    public static <T> void addAll(
        Iterable<? extends T> iterable, Collection<? super T> collection)
    {
        for (T t : iterable)
        {
            collection.add(t);
        }
    }

    /**
     * Returns all elements from the given iterable as a list
     * 
     * @param <T> A type that is related to the elements 
     * @param iterable The iterable
     * @return The list
     */
    public static <T> List<T> asList(Iterable<? extends T> iterable)
    {
        List<T> list = new ArrayList<T>();
        addAll(iterable, list);
        return list;
    }

    /**
     * Returns all elements from the given iterable as a set
     * 
     * @param <T> A type that is related to the elements 
     * @param iterable The iterable
     * @return The set
     */
    public static <T> Set<T> asSet(Iterable<? extends T> iterable)
    {
        Set<T> set = new LinkedHashSet<T>();
        addAll(iterable, set);
        return set;
    }

    /**
     * Private constructor to prevent instantiation
     */
    private Utils()
    {

    }
}

(请注意,与初始版本相比,代码中没有太多变化 - 基本上,它不使用ChoiceIterable,而是使用CombinationIterable。但是< em>组合 大于选项的数量,因此这仅适用于比初始解决方案小得多的输入。)

答案 1 :(得分:2)

终于有时间看一下了:

set cover problem的变体 - 它实际上是maximum coverage problem。而且我怀疑它是NP-hard

因此,总而言之,@ Marco13给出的答案是你能做的最好的(渐近)。它可能是优化的,也可能是其他技巧,但基本上就是它的优点。

答案 2 :(得分:0)

以简单的方式,您可以执行以下操作:我正在使用C#

var output = string.Join("",words.Select(t=>t.ToUpper())).ToCharArray().Distinct();

结果

B,I,K,E,T,R,F,U,L => n=9

如果输入是

words = {"Stupid","Stubborn","sun","safe"}, 
then the result would be S,T,U,P,I,D,B,O,R,N,A,F,E and count is 13

换句话说:你的问题是找到形成单词集所需的最小字母集,这意味着删除单词中所有重复的字符。

这是一个有效的sample

答案 3 :(得分:0)

这是Python中的另一个版本,它只使用十行代码作为核心算法。它显示了最大完整单词数的所有可能的字母组合。它还处理重复的字母(如FOO)。

import itertools

n = 9
words = ['bike', 'tire', 'fuel', 'biker', 'filter', 'trike']

# preparation: get the union of all letters in all words, including duplicate letters
all_letters = ''
for word in words:
    a = all_letters[:]
    for letter in word:
        if letter in a:
            a = a.replace(letter, '', 1)
        else:
            all_letters += letter

# helper function: find if a word with duplicate letters in a combination
def word_in_combo(word, combo):
    letters = list(combo)
    for letter in word:
        if letter not in letters:
            return False
        letters.remove(letter)
    return True

# algorithm: find all words for each combination of n letters
matches = {}
max_matched = 0
for combo in itertools.combinations(all_letters, n):
    matched = 0
    for word in words:
        if word_in_combo(word, combo):
            matched += 1
    matches[combo] = matched
    if matched > max_matched:
        max_matched = matched

# print the best combinations and the matching words
if max_matched == 0:
    print "No combinations for %d letters" % n
else:
    for combo in matches:
        if matches[combo] == max_matched:
            print combo, ':',
            for word in words:
                if word_in_combo(word, combo):
                    print word,
            print

对于n=4,输出为:

('e', 'f', 'u', 'l') : fuel
('i', 'e', 't', 'r') : tire
('b', 'i', 'k', 'e') : bike

对于n=5,输出为:

('i', 'k', 'e', 't', 'r') : tire trike
('b', 'i', 'k', 'e', 'r') : bike biker

答案 4 :(得分:-1)

这是一个用Python编写的版本。基本算法是:找到最常用的字母并使用它;从每个单词中删除该字母并重复。在这个过程中,如果我们只用一个字母完成一个单词,请先使用该单词。

这种方法的优势在于它可以提前发现&#34;在n增加时最大化单词数量。

words = ['bike', 'tire', 'fuel', 'biker', 'filter', 'trike']


def next_letter(words):
    """ find the next letter from the most common letter in the words """
    num_words = {}
    max_words = 0
    next = None
    for word in words:
        if len(word) == 1:        # if we can complete this word 
            return word[0]        # with one letter, do it!
        for letter in word:
            n = num_words.get(letter, 0) + 1    # tally the number of words
            num_words[letter] = n               # that use this letter
            if n > max_words:                   # a new maximum?
                max_words = n                   # use it
                next = letter
    return next


output = ''
while words:
    letter = next_letter(words)   # get the next letter
    if not letter: break          # reached the end? exit
    output += letter              # keep track of the letters
    # remove the selected letter from every word and try again
    words = [word.replace(letter, '', 1) if letter in word else word for word in words]

print '{', ','.join(output), '}'

此程序的输出位于:

  

{e,i,r,t,k,b,f,l,u}