术语文档矩阵手动实施。我们可以提高效率吗?

时间:2019-01-31 17:01:02

标签: python-3.x

下面的代码仅生成术语文档矩阵。我们可以提高效率吗?

PREPROCESSED = ['He is a good boy','he loves studying']
DICTIONARY = ['He', 'is', 'a', 'good', 'boy', 'loves', 'studying']
MATRIX = []
for sent in PREPROCESSED:
    temp = []
    for i in DICTIONARY:
        count = 0
        for words in sent.split():
            if i == words:
                count = count + 1
        temp.append(count)
    test = 0
    for i in temp:
        if i != 0:
            test = 1
    if test == 1:
        MATRIX.append(temp)
    del temp

1 个答案:

答案 0 :(得分:1)

我尝试重做该算法,但您确实做不到O(n * m)

具有一些次要(但如果列表增长很多,则很好)的代码会更改:

PREPROCESSED = ['He is a good boy','he loves studying']
DICTIONARY = ['He', 'is', 'a', 'good', 'boy', 'loves', 'studying']
MATRIX = []
for sent in PREPROCESSED:
    temp = []
    tmpSent = sent.split() #runs once instead of len(DICTIONARY) times
    for i in DICTIONARY:
        count = 0
        for word in tmpSent:
            if i == word:
                count += 1
        temp.append(count)
    for i in temp:
        if i != 0:
          # removes an extra test
          MATRIX.append(temp)
          break
    del temp

print(MATRIX)