for wclass in word_class_dict[most_ambigious_word]:
for sent in brown_sents:
if (most_ambigious_word.capitalize(), wclass) in sent or (most_ambigious_word.upper(), wclass) in sent or (most_ambigious_word.lower(), wclass) in sent:
print most_ambigious_word,"-",wclass
print " ".join(tuple[0] for tuple in sent)
break
澄清brown_sents
是一个无法更改的元组列表。至于简化位,我发现3种不同的检查有点难以编写。有什么想法吗?
编辑(对于那些对作业任务感兴趣的人): brown_sents是一个元组列表,包含如下元素:
[('word1' , 'wordclass1') , ('word2' , 'wordclass2') , ('word3' , 'wordclass2') ....]
所以,我正在寻找说word1
,但案件无关紧要。例如:word1
与Word1
和wOrd1
相同。 wclass
是wordclass,所以我只想打印出包含不同word1,wclass
对的句子(显然,如果word1有多个wordclasses,我想循环遍历wordclasses并为所有这些类打印出一个例子,这是最外面的循环)。
答案 0 :(得分:2)
如果您搜索多个单词,则创建一个集合是有意义的:
print(set(brown_sents).intersection(zip(repeat(most_ambiguous_word),
word_class_dict[most_ambiguous_word])))
#!/usr/bin/env python3
from itertools import repeat
word_class_dict = dict(word2=['wordclass1', 'wordclass2', 'wordclass3', 'wc5'])
brown_sents = [
('word1', 'wordclass1'),
('word2', 'wordclass2'),
('word3', 'wordclass2'),
('word2', 'wordclass3'),
('word2', 'wordclass4'),
]
most_ambiguous_word = 'Word2'
# search in `brown_sents` for `most_ambiguous_word`
# ignoring Unicode case-folding
most_ambiguous_word = most_ambiguous_word.lower()
print(set(brown_sents).intersection(zip(repeat(most_ambiguous_word),
word_class_dict[most_ambiguous_word])))
{('word2', 'wordclass2'), ('word2', 'wordclass3')}
要了解它的作用,请将脚本保存到文件,例如search-word.py
并运行:
$ python -i search-word.py
它显示了Python提示符:
>>>
您可以尝试使用单个表达式来查看它们的作用,例如:
>>> zip(repeat('a'), [1,2,3])
[('a', 1), ('a', 2), ('a', 3)]
>>> set('abcaadeff')
set(['a', 'c', 'b', 'e', 'd', 'f'])
>>> set('abcaadeff').intersection('abc')
set(['a', 'c', 'b'])
要查看帮助:
>>> help(zip)
Help on built-in function zip in module __builtin__:
zip(...)
zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
Return a list of tuples, where each tuple contains the i-th element
from each of the argument sequences. The returned list is truncated
in length to the length of the shortest argument sequence.
按q
退出。如果个人帮助信息不明确:
>>> help(repeat)
Help on class repeat in module itertools:
class repeat(__builtin__.object)
| repeat(element [,times]) -> create an iterator which returns the element
| for the specified number of times. If not specified, returns the element
| endlessly.
...[snip]...
尝试查看模块的在线帮助:
>>> module = 'itertools'
>>> import webbrowser
>>> webbrowser.open('http://docs.python.org/library/' + module)
并找到itertools.repeat()
函数。
简而言之:阅读文档,在提示符下尝试一些代码,重复一遍。如果你遇到困难,ask question。
答案 1 :(得分:0)
for wclass in word_class_dict[most_ambigious_word]:
for sent in brown_sents:
if (most_ambigious_word.lower(), wclass) in ((word[0].lower(),word[1]) for word in sent) :
print most_ambigious_word,"-",wclass
print " ".join(tuple[0] for tuple in sent)
break
实际上只需要这一切。
答案 2 :(得分:-1)
这应该有效:
if (most_ambigious_word.lower(), wclass) in (sent[0].lower(), sent[1]):
# ...