sample collection =
[['', 'test1.txt', 'test2.txt', 'test3.txt', 'test4.txt'],
['apple', 1, 1, 1, 1],
['banana', 1, 1, 0, 0],
['lemon', 1, 0, 2, 1]]
在进行tf-idf计算之前,我想收集每一行的“点击数”,也就是不为零的次数。
我尝试了以下操作,但被卡住了:
def n_calc(frequency_matrix):
non_zero_term_frequencies = []
for list_ in sample_collection:
for value in list_:
if value != 0.0:
non_zero_term_frequencies += 1
else:
pass
idf_calc(sample_collection)
所需的输出:
non_zero_term_frequencies: [4, 2, 3]
答案 0 :(得分:4)
使用列表理解,这应该可以工作:
counts = [
sum(1 if value else 0 for value in line[1:]) for line in collection[1:]
]
print(counts)
>>> [4, 2, 3]
答案 1 :(得分:1)
可以这样实现:
non_zero_term_frequencies = []
for list_ in frequency_matrix[1:]:
non_zero = len(list_[1:]) \
- len([x for x in list_ if x == 0])
non_zero_term_frequencies.append(non_zero)