Question

for wclass in word_class_dict[most_ambigious_word]:
    for sent in brown_sents:
        if (most_ambigious_word.capitalize(), wclass) in sent or (most_ambigious_word.upper(), wclass) in sent or (most_ambigious_word.lower(), wclass) in sent:
            print most_ambigious_word,"-",wclass
            print " ".join(tuple[0] for tuple in sent)
            break

澄清brown_sents是一个无法更改的元组列表。至于简化位，我发现3种不同的检查有点难以编写。有什么想法吗？

编辑（对于那些对作业任务感兴趣的人）： brown_sents是一个元组列表，包含如下元素：

[('word1' , 'wordclass1') , ('word2' , 'wordclass2') , ('word3' , 'wordclass2') ....]

所以，我正在寻找说word1，但案件无关紧要。例如：word1与Word1和wOrd1相同。 wclass是wordclass，所以我只想打印出包含不同word1,wclass对的句子（显然，如果word1有多个wordclasses，我想循环遍历wordclasses并为所有这些类打印出一个例子，这是最外面的循环）。

Answer 1

如果您搜索多个单词，则创建一个集合是有意义的：

print(set(brown_sents).intersection(zip(repeat(most_ambiguous_word),
                                        word_class_dict[most_ambiguous_word])))

Example

#!/usr/bin/env python3
from itertools import repeat

word_class_dict = dict(word2=['wordclass1', 'wordclass2', 'wordclass3', 'wc5'])
brown_sents = [
    ('word1', 'wordclass1'),
    ('word2', 'wordclass2'),
    ('word3', 'wordclass2'),
    ('word2', 'wordclass3'),
    ('word2', 'wordclass4'),
]

most_ambiguous_word = 'Word2'

# search in `brown_sents` for `most_ambiguous_word`
# ignoring Unicode case-folding
most_ambiguous_word = most_ambiguous_word.lower()
print(set(brown_sents).intersection(zip(repeat(most_ambiguous_word),
                                        word_class_dict[most_ambiguous_word])))

输出

{('word2', 'wordclass2'), ('word2', 'wordclass3')}

要了解它的作用，请将脚本保存到文件，例如search-word.py并运行：

$ python -i search-word.py

它显示了Python提示符：

>>>

您可以尝试使用单个表达式来查看它们的作用，例如：

>>> zip(repeat('a'), [1,2,3])
[('a', 1), ('a', 2), ('a', 3)]
>>> set('abcaadeff')
set(['a', 'c', 'b', 'e', 'd', 'f'])
>>> set('abcaadeff').intersection('abc')
set(['a', 'c', 'b'])

要查看帮助：

>>> help(zip)
Help on built-in function zip in module __builtin__:

zip(...)
    zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

    Return a list of tuples, where each tuple contains the i-th element
    from each of the argument sequences.  The returned list is truncated
    in length to the length of the shortest argument sequence.

按q退出。如果个人帮助信息不明确：

>>> help(repeat)
Help on class repeat in module itertools:

class repeat(__builtin__.object)
 |  repeat(element [,times]) -> create an iterator which returns the element
 |  for the specified number of times.  If not specified, returns the element
 |  endlessly.
...[snip]...

尝试查看模块的在线帮助：

>>> module = 'itertools'
>>> import webbrowser
>>> webbrowser.open('http://docs.python.org/library/' + module)

并找到itertools.repeat()函数。

简而言之：阅读文档，在提示符下尝试一些代码，重复一遍。如果你遇到困难，ask question。

Answer 2

for wclass in word_class_dict[most_ambigious_word]:
    for sent in brown_sents:
        if (most_ambigious_word.lower(), wclass) in ((word[0].lower(),word[1]) for word in sent) :
            print most_ambigious_word,"-",wclass
            print " ".join(tuple[0] for tuple in sent)
            break

实际上只需要这一切。

Answer 3

这应该有效：

if (most_ambigious_word.lower(), wclass) in (sent[0].lower(), sent[1]):
    # ...

更简单的方法来忽略python中的案例？

3 个答案:

Example

输出