Question

我有一本词典，其中包含以book和page为关键字的句子：

# lists to build dictionary - for reproducibility  
pages     = [12, 41, 50, 111, 1021, 121]
bookCodes = ['M', 'P', 'A', 'C', 'A', 'M']

sentences = ['THISISASENTANCE',
             'ANDHEREISONEMOREEXAMP',
             'ALLFROMDIFFERENTBOOKS',
             'ANDFROMDIFFERENTPAGES',
             'MOSLTYTHESAMELENGTHSS',
             'BUTSOMEWILLBABITSHORT'
             ]

# Make dictionary 
coordinates = defaultdict(dict)
for i in range(len(pages)):
    book = bookCodes[i]
    page = pages[i]
    sentence = sentences[i]
    coordinates[book][page] = sentence 

print coordinates

defaultdict(<type 'dict'>, {'A': {50: 'ALLFROMDIFFERENTBOOKS', 1021: 'MOSLTYTHESAMELENGTHSS'}, 'P': {41: 'ANDHEREISONEMOREEXAMP'}, 'C': {111: 'ANDFROMDIFFERENTPAGES'}, 'M': {121: 'BUTSOMEWILLBABITSHORT', 12: 'THISISASENTANCE'}})

我还有一个作为字典存储的元音池，因此每个元音以10开头：

vowels = dict.fromkeys(['A', 'E', 'I', 'O', 'U'], 10)

我想遍历每个句子（sentence[0][0]. sentence[n][0], ...）的相同元素，并且每次看到元音（A，E，I，{{1 }}或O）减少U字典中该元音的数量。

一旦元音池达到vowels，我将返回句子中的0，letter和position，然后中断循环。

sentence

重要的是，from collections import defaultdict import random def wordStopper(sentences): random.shuffle(sentences) vowels = dict.fromkeys(['A', 'E', 'I', 'O', 'U'], 10) for i in range(len(sentences[1])): for s in sentences: try: l = s[i:i + 1] except IndexError: continue if l in vowels: vowels[l] -= 1 print("Pos: %s, Letter: %s, Sentence: %s" % (i, l, s)) print("As = %s, Es = %s, Is = %s, Os = %s, Us = %s" %(vowels['A'], vowels['E'], vowels['I'], vowels['O'], vowels['U'])) if vowels[l] == 0: return(l, i, s) letter, location, sentence = wordStopper(sentences) print("Vowel %s exhausted here %s in sentence: %s" % (letter, location, sentence))列表应重新排序（并且在所有句子中依次遍历元素sentences，然后遍历元素0），这样我才不会偏向于较早1列表中的条目。

这符合我的预期，但是我现在要检索从中提取sentences的{{1}}和book的数字，这些数字存储在page中。

我可以通过遍历sentence并找到从coordinates返回的coordinates来粗略地实现这一点：

sentence

但是这让我感到很遗憾，无法实现这一目标。

通常，我可能会在句子前对wordStopper的键进行迭代，但是我看不到这样做的方法，因此它不会使结果偏向于迭代的第一个键。

任何建议都非常欢迎注意：这是一个玩具示例，所以我不想使用任何语料库解析工具

Answer 1

我认为您需要的是一个更好的数据结构，它使您可以从句子中检索书籍/页面。有很多可能的设计。这就是我要做的：

首先，创建一个包含句子及其书/页的数据结构：

class SentenceWithMeta(object):
    def __init__(self, sentence):
        self.sentence = sentence
        self.book = None
        self.page = None

然后，保留所有句子。例如：

sentences_with_meta = [SentenceWithMeta(sentence) for sentence in sentences]

这时，初始化句子和页元字段：book和page字段：

# Make dictionary
sentences_with_meta = [SentenceWithMeta(sentence) for sentence in sentences]
for i in range(len(pages)):
    book = bookCodes[i]
    page = pages[i]
    sentence_with_meta = sentences_with_meta[i]
    sentence_with_meta.book = book
    sentence_with_meta.page = page

最后，在wordStopper方法中，可以通过以下方式使用句子_with_meta数组：

def wordStopper(sentences):
    random.shuffle(sentences_with_meta)
    vowels = dict.fromkeys(['A', 'E', 'I', 'O', 'U'], 10)
    for i in range(len(sentences[1])):
        for swm in sentences_with_meta:
            try:
                l = swm.sentence[i:i + 1]
    ...
    # the rest of the code is the same. You return swm, which has the book
    # and page already in the structure.

侧面节点：要从字符串中获取字母i，您无需使用slice。只需使用索引引用即可：

l = swm.sentence[i]

还有许多其他设计也可以使用。

同时遍历字符串的元素

1 个答案: