我有一本词典,其中包含以book
和page
为关键字的句子:
# lists to build dictionary - for reproducibility
pages = [12, 41, 50, 111, 1021, 121]
bookCodes = ['M', 'P', 'A', 'C', 'A', 'M']
sentences = ['THISISASENTANCE',
'ANDHEREISONEMOREEXAMP',
'ALLFROMDIFFERENTBOOKS',
'ANDFROMDIFFERENTPAGES',
'MOSLTYTHESAMELENGTHSS',
'BUTSOMEWILLBABITSHORT'
]
# Make dictionary
coordinates = defaultdict(dict)
for i in range(len(pages)):
book = bookCodes[i]
page = pages[i]
sentence = sentences[i]
coordinates[book][page] = sentence
print coordinates
defaultdict(<type 'dict'>, {'A': {50: 'ALLFROMDIFFERENTBOOKS', 1021: 'MOSLTYTHESAMELENGTHSS'}, 'P': {41: 'ANDHEREISONEMOREEXAMP'}, 'C': {111: 'ANDFROMDIFFERENTPAGES'}, 'M': {121: 'BUTSOMEWILLBABITSHORT', 12: 'THISISASENTANCE'}})
我还有一个作为字典存储的元音池,因此每个元音以10开头:
vowels = dict.fromkeys(['A', 'E', 'I', 'O', 'U'], 10)
我想遍历每个句子(sentence[0][0]. sentence[n][0], ...
)的相同元素,并且每次看到元音(A
,E
,I
,{{1 }}或O
)减少U
字典中该元音的数量。
一旦元音池达到vowels
,我将返回句子中的0
,letter
和position
,然后中断循环。
sentence
重要的是,from collections import defaultdict
import random
def wordStopper(sentences):
random.shuffle(sentences)
vowels = dict.fromkeys(['A', 'E', 'I', 'O', 'U'], 10)
for i in range(len(sentences[1])):
for s in sentences:
try:
l = s[i:i + 1]
except IndexError:
continue
if l in vowels:
vowels[l] -= 1
print("Pos: %s, Letter: %s, Sentence: %s" % (i, l, s))
print("As = %s, Es = %s, Is = %s, Os = %s, Us = %s" %(vowels['A'], vowels['E'], vowels['I'], vowels['O'], vowels['U']))
if vowels[l] == 0:
return(l, i, s)
letter, location, sentence = wordStopper(sentences)
print("Vowel %s exhausted here %s in sentence: %s" % (letter, location, sentence))
列表应重新排序(并且在所有句子中依次遍历元素sentences
,然后遍历元素0
),这样我才不会偏向于较早1
列表中的条目。
这符合我的预期,但是我现在要检索从中提取sentences
的{{1}}和book
的数字,这些数字存储在page
中。
我可以通过遍历sentence
并找到从coordinates
返回的coordinates
来粗略地实现这一点:
sentence
但是这让我感到很遗憾,无法实现这一目标。
通常,我可能会在句子前对wordStopper
的键进行迭代,但是我看不到这样做的方法,因此它不会使结果偏向于迭代的第一个键。
任何建议都非常欢迎 注意:这是一个玩具示例,所以我不想使用任何语料库解析工具
答案 0 :(得分:1)
我认为您需要的是一个更好的数据结构,它使您可以从句子中检索书籍/页面。有很多可能的设计。这就是我要做的:
首先,创建一个包含句子及其书/页的数据结构:
class SentenceWithMeta(object):
def __init__(self, sentence):
self.sentence = sentence
self.book = None
self.page = None
然后,保留所有句子。例如:
sentences_with_meta = [SentenceWithMeta(sentence) for sentence in sentences]
这时,初始化句子和页元字段:book和page字段:
# Make dictionary
sentences_with_meta = [SentenceWithMeta(sentence) for sentence in sentences]
for i in range(len(pages)):
book = bookCodes[i]
page = pages[i]
sentence_with_meta = sentences_with_meta[i]
sentence_with_meta.book = book
sentence_with_meta.page = page
最后,在wordStopper方法中,可以通过以下方式使用句子_with_meta数组:
def wordStopper(sentences):
random.shuffle(sentences_with_meta)
vowels = dict.fromkeys(['A', 'E', 'I', 'O', 'U'], 10)
for i in range(len(sentences[1])):
for swm in sentences_with_meta:
try:
l = swm.sentence[i:i + 1]
...
# the rest of the code is the same. You return swm, which has the book
# and page already in the structure.
侧面节点:要从字符串中获取字母i,您无需使用slice。只需使用索引引用即可:
l = swm.sentence[i]
还有许多其他设计也可以使用。