Question

我有以下列表：t = ['one', 'two', 'three']

我想读取一个文件并为列表中存在的每个单词添加一个点。例如。如果“"one"”中存在"two"和CV.txt，则点= 2.如果所有这些都存在，那么点数= 3.

import nltk
from nltk import word_tokenize

t = ['one', 'two', 'three']
CV = open("cv.txt","r").read().lower()

points = 0

for words in t:
    if words in CV:
        #print(words)
        words = nltk.word_tokenize(words)
        print(words)
        li = len(words)
        print(li)
        points = li
        print(points)

假设'CV.txt'包含单词“one”和“two”，并且它被单词（标记化）拆分，则应将2个点添加到变量“{{1 }}“

但是，此代码返回：

points

正如你所看到的，长度只有1，但它应该是2.我确信有一种更有效的方法可以通过迭代循环或其他东西而不是len。任何帮助都将不胜感激。

Answer 1

我认为你不需要在循环中进行标记化，因此可能更简单的方法如下：

首先将txt文件中的单词标记为
检查每个常见词在t

最后，积分将是common_words中的单词数。

import nltk
from nltk import word_tokenize

t = ['one', 'two', 'three']
CV = open("untitled.txt","r").read().lower()

points = 0

words = nltk.word_tokenize(CV)
common_words = [word for word in words if word in t]
points = len(common_words)

注意：如果你想避免重复，那么你需要在上面的代码中设置一组常用词：

common_words = set(word for word in words if word in t)

将列表与文本文件进行比较

1 个答案: