Question

在这里询问“家庭作业”的解决方案时，我感到很遗憾。但是我已经花了4个小时，无法像这样继续下去。

分配：计算Lorem Ipsum文本（已提供）中特定字符串的出现次数；提供了一个辅助功能标记化，用于分割给定的文本并返回标记列表。

def tokenize(text):
    return text.split()

for token in tokenize(text):
    print(token)

任务：编写一个函数search_text()，该函数按以下顺序使用两个参数：filename和query。

该函数应返回文件query中filename的出现次数。

query = 'ipsum'
search_text('lorem-ipsum.txt', query) # returns 24

我的代码：

def tokenize(text):
    return text.split()

def search_text(filename, query):
    with open("lorem-ipsum.txt", "r") as filename:
      wordlist = filename.split()
      count = 0
   for query in wordlist:
      count = count + 1
   return count

query = "lorem"
search_text('lorem-ipsum.txt', query)

它不起作用，看起来有点混乱。老实说，我不理解函数tokenize()在这里的工作原理。

有人可以给我一个提示吗？

Answer 1

如果要使用函数tokenize()，则实际上必须调用它；您的代码没有。

此版本可以工作：

def tokenize(text):
    return text.split()

def search_text(filename, query):
    word_list = []
    with open(filename, 'r') as f:
        for line in f:
            line = line.strip()
            if len(line) > 0:
                # add tokens to the list, only if line is not empty
                wordlist.extend(tokenize(line))

    count = 0
    for word in word_list:
        if word == query:
            count += 1

    return count

query = "lorem"
search_text('lorem-ipsum.txt', query)

您还可以使用其他计数方法，例如this question节目。这是使用.count()序列方法的解决方案：

return word_list.count(query)

为什么要使用“ def tokenize（text）”？与计数文本内字符串的出现次数有关

1 个答案: