使用Python,查找单词列表的字谜

时间:2011-11-27 15:17:39

标签: python anagram

如果我有一个字符串列表,例如:

["car", "tree", "boy", "girl", "arc"...]

为了在该列表中找到字谜,我该怎么办?例如(car, arc)。 我尝试为每个字符串使用for循环,我使用if来忽略不同长度的字符串,但我无法得到正确的结果。

如何查看字符串中的每个字母并将其与列表中的其他字母按不同顺序进行比较?

我已经阅读了几个类似的问题,但答案太复杂了。我无法导入任何东西,我只能使用基本功能。

23 个答案:

答案 0 :(得分:24)

为了对2个字符串执行此操作,您可以执行此操作:

def isAnagram(str1, str2):
    str1_list = list(str1)
    str1_list.sort()
    str2_list = list(str2)
    str2_list.sort()

    return (str1_list == str2_list)

至于列表上的迭代,它非常直接

答案 1 :(得分:16)

创建(排序单词,单词列表)的字典。同一个列表中的所有单词都是彼此的字谜。

from collections import defaultdict

def load_words(filename='/usr/share/dict/american-english'):
    with open(filename) as f:
        for word in f:
            yield word.rstrip()

def get_anagrams(source):
    d = defaultdict(list)
    for word in source:
        key = "".join(sorted(word))
        d[key].append(word)
    return d

def print_anagrams(word_source):
    d = get_anagrams(word_source)
    for key, anagrams in d.iteritems():
        if len(anagrams) > 1:
            print(key, anagrams)

word_source = load_words()
print_anagrams(word_source)

或者:

word_source = ["car", "tree", "boy", "girl", "arc"]
print_anagrams(word_source)

答案 2 :(得分:7)

一种解决方案是对您正在搜索字谜的单词进行排序(例如使用sorted),对替代方案进行排序并进行比较。

因此,如果您要在列表['car', 'girl', 'tofu', 'rca']中搜索'rac'的字谜,您的代码可能如下所示:

word = sorted('rac')
alternatives = ['car', 'girl', 'tofu', 'rca']

for alt in alternatives:
    if word == sorted(alt):
        print alt

答案 3 :(得分:4)

对每个元素进行排序,然后查找重复项。有一个内置的排序功能,所以你不需要导入任何东西

答案 4 :(得分:2)

这个问题有多种解决方案:

  1. 经典方法

    首先,我们考虑一下anagram的定义: 两个单词是彼此的字谜,如果它们由同一组字母组成,并且每个字母在两个单词中出现的数字或时间完全相同 即可。这基本上是每个单词的字母数的直方图。这是collections.Counter数据结构(see docs)的完美用例。算法如下:

    • 构建一个字典,其中键是直方图,值是具有此直方图的单词列表。
    • 对于每个单词构建它的直方图并将其添加到与此直方图对应的列表中。
    • 字典值的输出列表。

    以下是代码:

    from collections import Counter, defaultdict
    
    def anagram(words):
        anagrams = defaultdict(list)
        for word in words:
            histogram = tuple(Counter(word).items()) # build a hashable histogram
            anagrams[histogram].append(word)
        return list(anagrams.values())
    
    keywords = ("hi", "hello", "bye", "helol", "abc", "cab", 
                    "bac", "silenced", "licensed", "declines")
    
    print(anagram(keywords))
    

    请注意,构建CounterO(l),而对每个单词进行排序为O(n*log(l)),其中l是单词的长度。

  2. 使用素数解析字谜

    这是一种更先进的解决方案,它依赖于素数的“乘法唯一性”。您可以参考此SO帖子:Comparing anagrams using prime numbershere is a sample python implementation

答案 5 :(得分:1)

由于您无法导入任何内容,因此有两种不同的方法,包括您要求的for循环。

方法1:用于循环和内置排序功能

word_list = ["percussion", "supersonic", "car", "tree", "boy", "girl", "arc"]

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (sorted(word_1)==sorted(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

方法2:字典

def freq(word):
    freq_dict = {}
    for char in word:
        freq_dict[char] = freq_dict.get(char, 0) + 1
    return freq_dict

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (freq(word_1) == freq(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

如果您想更详细地说明这些方法,请参见article

答案 6 :(得分:1)

以前的大多数答案都是正确的,这是比较两个字符串的另一种方法。 与排序相比,使用此策略的主要好处是时空复杂度为n log of n

1。检查字符串的长度

2。建立频率字典并比较它们是否匹配,那么我们就成功地确定了字谜词

def char_frequency(word):
    frequency  = {}
    for char in word:
        #if character  is in frequency then increment the value
        if char in frequency:
            frequency[char] += 1
        #else add character and set it to 1
        else:
            frequency[char] = 1
    return frequency 


a_word ='google'
b_word ='ooggle'
#check length of the words 
if (len(a_word) != len(b_word)):
   print ("not anagram")
else:
    #here we check the frequecy to see if we get the same
    if ( char_frequency(a_word) == char_frequency(b_word)):
        print("found anagram")
    else:
        print("no anagram")

答案 7 :(得分:1)

def findanagranfromlistofwords(li):
    dict = {}
    index=0
    for i in range(0,len(li)):
        originalfirst = li[index]
        sortedfirst = ''.join(sorted(str(li[index])))
        for j in range(index+1,len(li)):
            next = ''.join(sorted(str(li[j])))
            print next
            if sortedfirst == next:
                dict.update({originalfirst:li[j]})
                print "dict = ",dict
        index+=1

    print dict

findanagranfromlistofwords(["car", "tree", "boy", "girl", "arc"])

答案 8 :(得分:0)

# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
        "Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP",
         "Protijayi","Paikpara","dipSouta","Shyambazaar",
        "jayiProti", "North Calcutta", "Sovabazaar"]

#Method 1
A = [''.join(sorted(word)) for word in words]

dict ={}

for indexofsamewords,samewords in enumerate(A):
    dict.setdefault(samewords, []).append(indexofsamewords)
    
print(dict)
#{'AOOPR': [0, 2, 5, 9, 11], 'ABTU': [1, 3, 4], 'Sadioptu': [6, 14], ' KPaaehiklry': [7], 'Taeggllnouy': [8], 'Leov': [10], 'Paiijorty': [12, 16], 'Paaaikpr': [13], 'Saaaabhmryz': [15], ' CNaachlortttu': [17], 'Saaaaborvz': [18]}

for index in dict.values(): 
    print( [words[i] for i in index ] )
    

输出:

['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']

答案 9 :(得分:0)

def all_anagrams(words: [str]) -> [str]:
    word_dict = {}
    for word in words:
        sorted_word  = "".join(sorted(word))
        if sorted_word in word_dict:
            word_dict[sorted_word].append(word)
        else:
            word_dict[sorted_word] = [word]
    return list(word_dict.values())  

答案 10 :(得分:0)

只需使用Python3集合包中的 Counter方法

str1="abc"
str2="cab"

Counter(str1)==Counter(str2)
# returns True i.e both Strings are anagrams of each other.

答案 11 :(得分:0)

这很好:


def find_ana(l):
    a=[]
    for i in range(len(l)):
        for j in range(len(l)): 
            if (l[i]!=l[j]) and (sorted(l[i])==sorted(l[j])):
                a.append(l[i])
                a.append(l[j])

    return list(set(a))

答案 12 :(得分:0)

这个会帮助你:

假设输入是以逗号分隔的字符串

控制台输入: ABC,BAC,汽车,外消旋,PQR,ACB,ACR,ABC

in_list = list()
in_list = map(str, raw_input("Enter strings seperated by comma").split(','))
list_anagram = list()

for i in range(0, len(in_list) - 1):
    if sorted(in_list[i]) not in list_anagram:
        for j in range(i + 1, len(in_list)):
            isanagram = (sorted(in_list[i]) == sorted(in_list[j]))
            if isanagram:
                list_anagram.append(sorted(in_list[i]))
                print in_list[i], 'isanagram'
                break

答案 13 :(得分:0)

Python中的简单解决方案

def anagram(s1,s2):

    # Remove spaces and lowercase letters
    s1 = s1.replace(' ','').lower()
    s2 = s2.replace(' ','').lower()

    # Return sorted match.
    return sorted(s1) == sorted(s2)

答案 14 :(得分:0)

import collections

def find_anagrams(x):
    anagrams = [''.join(sorted(list(i))) for i in x]
    anagrams_counts = [item for item, count in collections.Counter(anagrams).items() if count > 1]
    return [i for i in x if ''.join(sorted(list(i))) in anagrams_counts]

答案 15 :(得分:0)

我使用字典逐个存储字符串的每个字符。然后迭代第二个字符串并在字典中找到该字符,如果它存在则减少字典中相应键的计数。

class Anagram:

    dict = {}

    def __init__(self):
        Anagram.dict = {}

    def is_anagram(self,s1, s2):
        print '***** starting *****'

        print '***** convert input strings to lowercase'
        s1 = s1.lower()
        s2 = s2.lower()

        for i in s1:
           if i not in Anagram.dict:
              Anagram.dict[i] = 1
           else:
              Anagram.dict[i] += 1

        print Anagram.dict

        for i in s2:
           if i not in Anagram.dict:
              return false
           else:
              Anagram.dict[i] -= 1

        print Anagram.dict

       for i in Anagram.dict.keys():
          if Anagram.dict.get(i) == 0:
              del Anagram.dict[i]

       if len(Anagram.dict) == 0:
         print Anagram.dict
         return True
       else:
         return False

答案 16 :(得分:0)

python中的解决方案如下:

class Word:
    def __init__(self, data, index):
        self.data = data
        self.index = index

def printAnagrams(arr):
    dupArray = []
    size = len(arr)

    for i in range(size):
        dupArray.append(Word(arr[i], i))

    for i in range(size):
        dupArray[i].data = ''.join(sorted(dupArray[i].data))

    dupArray = sorted(dupArray, key=lambda x: x.data)

    for i in range(size):
        print arr[dupArray[i].index]

def main():
    arr = ["dog", "act", "cat", "god", "tac"]

    printAnagrams(arr)

if __name__== '__main__':
    main()
  1. 首先使用表示其位置索引的索引创建相同单词的重复列表。
  2. 然后对重复列表的各个字符串进行排序
  3. 然后根据字符串对重复列表进行排序。
  4. 最后使用从重复数组中使用的索引打印原始列表。
  5. 上述时间复杂度为O(NMLogN + NMLogM)= O(NMlogN)

答案 17 :(得分:-1)

集合是输出的适当数据结构,因为您可能不希望输出中出现冗余。如果先前已经观察过特定的字母序列,以及它最初来自哪个单词,字典非常适​​合查找。利用我们可以在不扩展集合的情况下多次将相同项目添加到集合的事实,我们可以使用一个for循环。

import numpy as np

def vector_anagram(l):
    d, out = dict(), set()
    for word in l:
        s = np.zeros(26, dtype=int)
        for c in word:
            s[ord(c)-97] += 1
        s = tuple(s)
        try:
            out.add(d[s])
            out.add(word)
        except:
            d[s] = word
    return out

更快的方法是利用加法的交换属性:

grails dbm-gorm-diff ...

答案 18 :(得分:-2)

  1. 计算每个字长。
  2. 计算每个单词ascii character sum。
  3. 按照ascii值对每个单词字符进行排序,并设置有序单词。
  4. 根据长度分组。
  5. 对于每个组重新组合列表,根据其ascii字符总和。
  6. 对于每个小清单,只检查订购的字词。如果有序的单词与这些单词anagram相同。
  7. 这里我们有1000.000个单词列表。 1000.000 words

        namespace WindowsFormsApplication2
        {
            public class WordDef
            {
                public string Word { get; set; }
                public int WordSum { get; set; }
                public int Length { get; set; }       
                public string AnagramWord { get; set; }
                public string Ordered { get; set; }
                public int GetAsciiSum(string word)
                {
                    int sum = 0;
                    foreach (char c in word)
                    {
                        sum += (int)c;
                    }
                    return sum;
                }
            }
        }
    
        using System;
        using System.Collections.Concurrent;
        using System.Collections.Generic;
        using System.Diagnostics;
        using System.Linq;
        using System.Net;
        using System.Text;
        using System.Threading.Tasks;
        using System.Windows.Forms;
    
        namespace WindowsFormsApplication2
        {
            public partial class AngramTestForm : Form
            {
                private ConcurrentBag<string> m_Words;
    
                private ConcurrentBag<string> m_CacheWords;
    
                private ConcurrentBag<WordDef> m_Anagramlist;
                public AngramTestForm()
                {
                    InitializeComponent();
                    m_CacheWords = new ConcurrentBag<string>();
                }
    
                private void button1_Click(object sender, EventArgs e)
                {
                    m_Words = null;
                    m_Anagramlist = null;
    
                    m_Words = new ConcurrentBag<string>();
                    m_Anagramlist = new ConcurrentBag<WordDef>();
    
                    if (m_CacheWords.Count == 0)
                    {
                        foreach (var s in GetWords())
                        {
                            m_CacheWords.Add(s);
                        }
                    }
    
                    m_Words = m_CacheWords;
    
                    Stopwatch sw = new Stopwatch();
    
                    sw.Start();
    
                    //DirectCalculation();
    
                    AsciiCalculation();
    
                    sw.Stop();
    
                    Console.WriteLine("The End! {0}", sw.ElapsedMilliseconds);
    
                    this.Invoke((MethodInvoker)delegate
                    {
                        lbResult.Text = string.Concat(sw.ElapsedMilliseconds.ToString(), " Miliseconds");
                    });
    
                    StringBuilder sb = new StringBuilder();
                    foreach (var w in m_Anagramlist)
                    {
                        if (w != null)
                        {
                            sb.Append(string.Concat(w.Word, " - ", w.AnagramWord, Environment.NewLine));
                        }
                    }
    
                    txResult.Text = sb.ToString();
                }
    
                private void DirectCalculation()
                {
                    List<WordDef> wordDef = new List<WordDef>();
    
                    foreach (var w in m_Words)
                    {
                        WordDef wd = new WordDef();
                        wd.Word = w;
                        wd.WordSum = wd.GetAsciiSum(w);
                        wd.Length = w.Length;
                        wd.Ordered = String.Concat(w.OrderBy(c => c));
    
                        wordDef.Add(wd);
                    }
    
                    foreach (var w in wordDef)
                    {
                        foreach (var t in wordDef)
                        {
                            if (w.Word != t.Word)
                            {
                                if (w.Ordered == t.Ordered)
                                {
                                    t.AnagramWord = w.Word;
                                    m_Anagramlist.Add(new WordDef() { Word = w.Word, AnagramWord = t.Word });
                                }
                            }
                        }
                    }
                }
    
                private void AsciiCalculation()
                {
                    ConcurrentBag<WordDef> wordDef = new ConcurrentBag<WordDef>();
    
                    Parallel.ForEach(m_Words, w =>
                        {
                            WordDef wd = new WordDef();
                            wd.Word = w;
                            wd.WordSum = wd.GetAsciiSum(w);
                            wd.Length = w.Length;
                            wd.Ordered = String.Concat(w.OrderBy(c => c));
    
                            wordDef.Add(wd);                    
                        });
    
                    var tempWordByLength = from w in wordDef
                                           group w by w.Length into newGroup
                                           orderby newGroup.Key
                                           select newGroup;
    
                    foreach (var wList in tempWordByLength)
                    {
                        List<WordDef> wd = wList.ToList<WordDef>();
    
                        var tempWordsBySum = from w in wd
                                             group w by w.WordSum into newGroup
                                             orderby newGroup.Key
                                             select newGroup;
    
                        Parallel.ForEach(tempWordsBySum, ws =>
                            {
                                List<WordDef> we = ws.ToList<WordDef>();
    
                                if (we.Count > 1)
                                {
                                    CheckCandidates(we);
                                }
                            });
                    }
                }
    
                private void CheckCandidates(List<WordDef> we)
                {
                    for (int i = 0; i < we.Count; i++)
                    {
                        for (int j = i + 1; j < we.Count; j++)
                        {
                            if (we[i].Word != we[j].Word)
                            {
                                if (we[i].Ordered == we[j].Ordered)
                                {
                                    we[j].AnagramWord = we[i].Word;
                                    m_Anagramlist.Add(new WordDef() { Word = we[i].Word, AnagramWord = we[j].Word });
                                }
                            }
                        }
                    }
                }
    
                private static string[] GetWords()
                {
                    string htmlCode = string.Empty;
    
                    using (WebClient client = new WebClient())
                    {
                        htmlCode = client.DownloadString("https://raw.githubusercontent.com/danielmiessler/SecLists/master/Passwords/10_million_password_list_top_1000000.txt");
                    }
    
                    string[] words = htmlCode.Split(new string[] { "\n" }, StringSplitOptions.RemoveEmptyEntries);
    
                    return words;
                }
            }
        }
    

答案 19 :(得分:-3)

这是令人印象深刻的解决方案。

funct alphabet_count_mapper:

对于文件/列表中的每个单词

1.创建一个字母/字符字典,初始计数为0。

2.保持单词中所有字母的计数,并增加上述字母表中的计数。

3.创建字母数字dict并返回字母dict值的元组。

功能anagram_counter:

1.创建一个以字母计数元组为键的字典和对其出现次数的计数。

2.对上述字典进行控制,如果值>&gt; 1,将值添加到字谜计数。

/*

使用文件路径作为命令行参数

运行它

答案 20 :(得分:-4)

您将单词中的每个字符转换为数字(通过 ord()函数),将其添加到单词中。如果两个词的总和相同,那么它们就是字谜。然后过滤列表中出现两次以上的总和。

def sumLet(w):
    return sum([ord(c) for c in w])

def find_anagrams(l):
    num_l = map(sumLet,l)
    return [l[i] for i,num in enumerate(num_l) if num_l.count(num) > 1]

答案 21 :(得分:-5)

>>> words = ["car", "race", "rac", "ecar", "me", "em"]
>>> anagrams = {}
... for word in words:
...     reverse_word=word[::-1]
...     if reverse_word in words:
...         anagrams[word] = (words.pop(words.index(reverse_word)))
>>> anagrams
20: {'car': 'rac', 'me': 'em', 'race': 'ecar'}

<强>逻辑:

  1. 从第一个单词开始并反转单词。
  2. 检查列表中是否存在反转字。
  3. 如果存在,找到索引并弹出项目并将其存储在字典中,将单词作为键并将单词反转为值。

答案 22 :(得分:-6)

如果你想在java中使用解决方案,

public List<String> findAnagrams(List<String> dictionary) {

    // TODO do null check and other basic validations.
    Map<String, List<String>> wordMap = new HashMap<String, List<String>>();

    for(String word : dictionary) {

        // ignore if word is null
        char[] tempWord = word.tocharArray();
        Arrays.sort(tempWord);
        String newWord = new String(tempWord);

        if(wordMap.containsKey(newWord)) {
            wordMap.put(newWord, wordMap.get(word).add(word));
        } else {
            wordMap.put(newWord, new ArrayList<>() {word});
        }

    }

    List<String> anagrams = new ArrayList<>();

    for(String key : wordMap.keySet()) {

        if(wordMap.get(key).size() > 1) {
            anagrams.addAll(wordMap.get(key));
        }

    }

    return anagrams;
}