Question

我需要迭代两个文件中的所有行（同时进行）并从其中一个单词中获取索引。

示例：

small_wordlist：

Book
Woman
Child

big_wordlist：

Book
Man
Dog
Cat
Child
Dinosaur
Woman

等等。想要的结果将是：

1
7
5

（或者每次从0开始减少一次，这并不重要）并将其保存在另一个文件中。

我无法像这样使用它：

g = open('big_wordlist', 'r')
i = open('index_list', 'w')

with open('small_wordlist', 'r') as h:
for line in h:
    p = h.readline()
    for num, line in enumerate(g):          # num is my found index
            if (line.startswith(p + "\n")): # need that to make sure we only get the correct word and nothing before / after it
                 i.write("%s" % (num) + "\n")

所以我需要遍历小词列表，从大词汇表中找到的词中获取特定词索引，并将其写入我的索引列表。

现在我得到“混合迭代和读取方法会丢失数据” - 在我将数字写入我的索引列表之后，我不会关心这一点，p（当时的单词）无论如何都会改变（并且应该） small_wordlist中的新行。

当我在小单词列表上进行迭代时遇到问题，当我用“Book”替换p时它确实有效，现在我需要使用变量作为我的小单词列表的每一行中的单词。< / p>

Answer 1

您无需同时处理这两个文件。相反，您需要构建第一个文件的索引，然后处理查找索引中的单词的第二个文件。

#!python3

small_wordlist = """
    Book
    Woman
    Child
""".strip()

big_wordlist = """
    Book
    Man
    Dog
    Cat
    Child
    Dinosaur
    Woman
""".strip()

import io

# Read the words from the big wordlist into word_index

#with open('big_wordlist.txt') as big:
with io.StringIO(big_wordlist) as big:
    ix = 0
    word_index = {}

    for line in big:
        word = line.strip()
        if word not in word_index:
            word_index[word] = ix
        ix += 1

#with open('small_wordlist.txt') as small:
with io.StringIO(small_wordlist) as small:
    for line in small:
        word = line.strip()
        if word not in word_index:
            print('-1')  # Or print('not found') or raise exception or...
        else:
            print(word_index[word])

迭代文件并从其他文件中获取单词索引

1 个答案: