Question

f1 f2 f3  f4.....**f277436** (column headers)
0  9  1    4      0
56 2  66   8      0
**(3227 rows...)**

我想在每列中找到非零值的出现次数。例如，在前面提到的情况中，[1,2,2,2,...0]如何使用Python找到它？

for k in range(1,7):
    final=[]
    f="Dataset/Cross/N_grams_recored/"+str(k)+"_gram.csv"
    with open(f) as f:
        csvreader = csv.reader(f)
        tags = next(csvreader)
        sums = [0] * len(tags)
        for count, row in enumerate(csvreader):
            sums = [int(x) + int(y) for x, y in zip(sums, row)] # finding sum
    avgs = [x / count for x in sums]
    print count
    result_tags = [h for (h, a) in zip(tags, avgs) if a > 0.3]
##    final.append(result_tags)
filename="Dataset/Cross/N_gram_Features_Pruned/"+str(k)+"_gram.txt"
filewrite=open(filename,"w")    
filewrite.write(str(result_tags))
filewrite.close()

Answer 1

最简单的方法可能是使用numpy

[137]:
import numpy as np
d=np.genfromtxt('temp1.txt',skip_header=1)
d
Out[137]:
array([[ 2.,  2.,  7.,  7.,  8.,  9.,  0.,  2.,  3.,  5.],
       [ 4.,  9.,  4.,  0.,  2.,  3.,  4.,  3.,  0.,  4.],
       [ 6.,  9.,  3.,  8.,  7.,  2.,  5.,  8.,  7.,  8.],
       [ 8.,  1.,  8.,  7.,  2.,  3.,  8.,  3.,  2.,  6.],
       [ 5.,  9.,  5.,  6.,  9.,  2.,  9.,  5.,  8.,  1.],
       [ 0.,  1.,  0.,  2.,  0.,  9.,  7.,  4.,  5.,  3.],
       [ 2.,  1.,  9.,  9.,  4.,  0.,  1.,  4.,  0.,  5.],
       [ 2.,  9.,  0.,  2.,  3.,  6.,  1.,  5.,  2.,  6.],
       [ 7.,  0.,  8.,  4.,  2.,  3.,  7.,  9.,  3.,  9.],
       [ 2.,  2.,  9.,  8.,  8.,  0.,  6.,  3.,  8.,  6.]])
In [138]:np.sum(d!=0, axis=0)
Out[138]:
array([ 9,  9,  8,  9,  9,  8,  9, 10,  8, 10])

数据文件如下：

f0 f1 f2 f3 f4 f5 f6 f7 f8 f9
2 2 7 7 8 9 0 2 3 5
4 9 4 0 2 3 4 3 0 4
6 9 3 8 7 2 5 8 7 8
8 1 8 7 2 3 8 3 2 6
5 9 5 6 9 2 9 5 8 1
0 1 0 2 0 9 7 4 5 3
2 1 9 9 4 0 1 4 0 5
2 9 0 2 3 6 1 5 2 6
7 0 8 4 2 3 7 9 3 9
2 2 9 8 8 0 6 3 8 6

使用python查找csv文件列中非零值的出现次数

1 个答案: