更简单的方法来忽略python中的案例?

时间:2012-02-29 12:04:33

标签: python

for wclass in word_class_dict[most_ambigious_word]:
    for sent in brown_sents:
        if (most_ambigious_word.capitalize(), wclass) in sent or (most_ambigious_word.upper(), wclass) in sent or (most_ambigious_word.lower(), wclass) in sent:
            print most_ambigious_word,"-",wclass
            print " ".join(tuple[0] for tuple in sent)
            break

澄清brown_sents是一个无法更改的元组列表。至于简化位,我发现3种不同的检查有点难以编写。有什么想法吗?

编辑(对于那些对作业任务感兴趣的人): brown_sents是一个元组列表,包含如下元素:

[('word1' , 'wordclass1') , ('word2' , 'wordclass2') , ('word3' , 'wordclass2') ....]

所以,我正在寻找说word1,但案件无关紧要。例如:word1Word1wOrd1相同。 wclass是wordclass,所以我只想打印出包含不同word1,wclass对的句子(显然,如果word1有多个wordclasses,我想循环遍历wordclasses并为所有这些类打印出一个例子,这是最外面的循环)。

3 个答案:

答案 0 :(得分:2)

如果您搜索多个单词,则创建一个集合是有意义的:

print(set(brown_sents).intersection(zip(repeat(most_ambiguous_word),
                                        word_class_dict[most_ambiguous_word])))

Example

#!/usr/bin/env python3
from itertools import repeat

word_class_dict = dict(word2=['wordclass1', 'wordclass2', 'wordclass3', 'wc5'])
brown_sents = [
    ('word1', 'wordclass1'),
    ('word2', 'wordclass2'),
    ('word3', 'wordclass2'),
    ('word2', 'wordclass3'),
    ('word2', 'wordclass4'),
]

most_ambiguous_word = 'Word2'

# search in `brown_sents` for `most_ambiguous_word`
# ignoring Unicode case-folding
most_ambiguous_word = most_ambiguous_word.lower()
print(set(brown_sents).intersection(zip(repeat(most_ambiguous_word),
                                        word_class_dict[most_ambiguous_word])))

输出

{('word2', 'wordclass2'), ('word2', 'wordclass3')}

要了解它的作用,请将脚本保存到文件,例如search-word.py并运行:

$ python -i search-word.py

它显示了Python提示符:

>>>

您可以尝试使用单个表达式来查看它们的作用,例如:

>>> zip(repeat('a'), [1,2,3])
[('a', 1), ('a', 2), ('a', 3)]
>>> set('abcaadeff')
set(['a', 'c', 'b', 'e', 'd', 'f'])
>>> set('abcaadeff').intersection('abc')
set(['a', 'c', 'b'])

要查看帮助:

>>> help(zip)
Help on built-in function zip in module __builtin__:

zip(...)
    zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

    Return a list of tuples, where each tuple contains the i-th element
    from each of the argument sequences.  The returned list is truncated
    in length to the length of the shortest argument sequence.

q退出。如果个人帮助信息不明确:

>>> help(repeat)
Help on class repeat in module itertools:

class repeat(__builtin__.object)
 |  repeat(element [,times]) -> create an iterator which returns the element
 |  for the specified number of times.  If not specified, returns the element
 |  endlessly.
...[snip]...

尝试查看模块的在线帮助:

>>> module = 'itertools'
>>> import webbrowser
>>> webbrowser.open('http://docs.python.org/library/' + module)

并找到itertools.repeat()函数。

简而言之:阅读文档,在提示符下尝试一些代码,重复一遍。如果你遇到困难,ask question

答案 1 :(得分:0)

for wclass in word_class_dict[most_ambigious_word]:
    for sent in brown_sents:
        if (most_ambigious_word.lower(), wclass) in ((word[0].lower(),word[1]) for word in sent) :
            print most_ambigious_word,"-",wclass
            print " ".join(tuple[0] for tuple in sent)
            break

实际上只需要这一切。

答案 2 :(得分:-1)

这应该有效:

if (most_ambigious_word.lower(), wclass) in (sent[0].lower(), sent[1]):
    # ...