下面的代码仅生成术语文档矩阵。我们可以提高效率吗?
PREPROCESSED = ['He is a good boy','he loves studying']
DICTIONARY = ['He', 'is', 'a', 'good', 'boy', 'loves', 'studying']
MATRIX = []
for sent in PREPROCESSED:
temp = []
for i in DICTIONARY:
count = 0
for words in sent.split():
if i == words:
count = count + 1
temp.append(count)
test = 0
for i in temp:
if i != 0:
test = 1
if test == 1:
MATRIX.append(temp)
del temp
答案 0 :(得分:1)
我尝试重做该算法,但您确实做不到
具有一些次要(但如果列表增长很多,则很好)的代码会更改:
PREPROCESSED = ['He is a good boy','he loves studying']
DICTIONARY = ['He', 'is', 'a', 'good', 'boy', 'loves', 'studying']
MATRIX = []
for sent in PREPROCESSED:
temp = []
tmpSent = sent.split() #runs once instead of len(DICTIONARY) times
for i in DICTIONARY:
count = 0
for word in tmpSent:
if i == word:
count += 1
temp.append(count)
for i in temp:
if i != 0:
# removes an extra test
MATRIX.append(temp)
break
del temp
print(MATRIX)