给出一个单词列表,如何将它们放入“家庭”?

时间:2013-04-10 21:40:02

标签: python python-3.x

我正在使用python创建一个邪恶的刽子手游戏,我被卡住了。我正在试图弄清楚如何将文字放入家庭。例如,假设我有一个列表

ALLY BETA COOL DEAL ELSE FLEW GOOD HOPE IBEX 

根据E的位置,每个单词都属于少数几个家庭之一:

- - - -, containing ALLY, COOL, GOOD
- E - -, containing BETA and DEAL
- - E -, containing FLEW and IBEX
E - - E, containing ELSE
- - - E, containing HOPE.

有没有办法使用字典来帮助确定哪些词属于哪些家庭?我们班上还没有谈论字典,但我提前阅读并相信它是可能的。我正在使用的文件大约是170,000个单词,但上面只是一个简单的例子。

4 个答案:

答案 0 :(得分:3)

from itertools import groupby

words = ['ALLY', 'BETA', 'COOL', 'DEAL', 'ELSE', 'FLEW', 'GOOD', 'HOPE', 'IBEX']
e_locs = sorted(([c == 'E' for c in w], i) for i, w in enumerate(words))
result = [[words[i] for x, i in g] for k, g in groupby(e_locs, lambda x: x[0])]

结果:

>>> result
[['ALLY', 'COOL', 'GOOD'], ['HOPE'], ['FLEW', 'IBEX'], ['BETA', 'DEAL'], ['ELSE']]

这是一个跟踪Es所在位置的版本:

words = ['ALLY', 'BETA', 'COOL', 'DEAL', 'ELSE', 'FLEW', 'GOOD', 'HOPE', 'IBEX']
result = {}
for word in words:
    key = ' '.join('E' if c == 'E' else '-' for c in word)
    if key not in result:
        result[key] = []
    result[key].append(word)

结果:

>>> pprint.pprint(result)
{'- - - -': ['ALLY', 'COOL', 'GOOD'],
 '- - - E': ['HOPE'],
 '- - E -': ['FLEW', 'IBEX'],
 '- E - -': ['BETA', 'DEAL'],
 'E - - E': ['ELSE']}

选择最大的家庭(使用第一个版本,其中result是列表列表):

>>> max(result, key=len)
['ALLY', 'COOL', 'GOOD']

要选择使用第二个版本的最大家庭,您可以使用result.values()代替result,或者获取包含E位置和家庭的元组,您可以使用以下内容:

>>> max(result.items(), key=lambda k_v: len(k_v[1]))
('- - - -', ['ALLY', 'COOL', 'GOOD'])

答案 1 :(得分:1)

In [1]: from itertools import groupby

In [2]: import string

In [3]: words = "ALLY BETA COOL DEAL ELSE FLEW GOOD HOPE IBEX".split()

In [4]: table = string.maketrans('ABCDEFGHIJKLMNOPQRSTUVWXYZ',
   ...:                          '????E?????????????????????')

In [5]: f = lambda w: w.translate(table)

In [6]: for k,g in groupby(sorted(words, key=f), f):
   ...:     print k, list(g)
   ...:     
???? ['ALLY', 'COOL', 'GOOD']
???E ['HOPE']
??E? ['FLEW', 'IBEX']
?E?? ['BETA', 'DEAL']
E??E ['ELSE']

# to get the biggest group
In [7]: max((list(g) for _,g in groupby(sorted(words, key=f), f)), key=len)
Out[7]: ['ALLY', 'COOL', 'GOOD']

答案 2 :(得分:0)

使用正则表达式可以执行以下操作:

import re

def into_families(words):
    # here you could add as many families as you want
    families = {
                '....': re.compile('[^E]{4}'),
                '...E': re.compile('[^E]{3}E'),
                '..E.': re.compile('[^E]{2}E[^E]'),
                '.E..': re.compile('[^E]E[^E]{2}'),
                'E..E': re.compile('E[^E]{2}E'),
    }
    return dict((k, [w for w in words if r.match(w)]) for k, r in families.items())

或者如果你想动态创建正则表达式:

def into_families(words):
    family_names = set(''.join('E' if x == 'E' else '.' for x in w) for w in words)
    families = dict((x, re.compile(x.replace('.', '[^E]'))) for x in family_names)
    return dict((k, [w for w in words if r.match(w)]) for k, r in families.items())

答案 3 :(得分:0)

from collections import defaultdict
import re

words = 'ALLY BETA COOL DEAL ELSE FLEW GOOD HOPE IBEX'.split()

groups = defaultdict(list)

for word in words:
    indices = tuple(m.start() for m in re.finditer('E', word))
    groups[indices].append(word)

for k, v in sorted(groups.items()):
    tpl = ['E' if i in k else'-' for i in range(4)]
    print ' '.join(tpl), ' '.join(v)