Question

我检查了类似的主题，但结果很差。

我有一个这样的文件：

S1_22   45317082    31  0   9   22  1543
S1_23   3859606 40  3   3   34  2111
S1_24   48088383    49  6   1   42  2400
S1_25   43387855    39  1   7   31  2425
S1_26   39016907    39  2   7   30  1977
S1_27   57612149    23  0   0   23  1843
S1_28   42505824    23  1   1   21  1092
S1_29   54856684    18  0   2   16  1018
S1_29   54856684    18  0   2   16  1018
S1_29   54856684    18  0   2   16  1018
S1_29   54856684    18  0   2   16  1018

我想计算第一列中单词的出现次数，并根据写入输出文件的附加字段说明uniq如果count == 1和multi if if count＆gt; 0

我制作了代码：

import csv
import collections

infile = 'Results'

names = collections.Counter()

with open(infile) as input_file:
    for row in csv.reader(input_file, delimiter='\t'):
        names[row[0]] += 1
    print names[row[0]],row[0]

但它无法正常工作

我无法将所有内容放入列表中，因为文件太大

Answer 1

最后的print语句看起来不像你想要的。由于它的缩进，它只执行一次。它将打印S1_29，因为这是循环的最后一次迭代中row[0]的值。

你走在正确的轨道上。而不是那个印刷语句，只需遍历键和＆amp;计数器的值，并检查每个值是否大于或等于1.

Answer 2

如果您希望此代码有效，则应缩进print声明：

    names[row[0]] += 1
    print names[row[0]],row[0]

但你真正想要的是：

import csv
import collections

infile = 'Result'

names = collections.Counter()

with open(infile) as input_file:
    for row in csv.reader(input_file, delimiter='\t'):
        names[row[0]] += 1

for name, count in names.iteritems():
    print name, count

编辑：要显示该行的其余部分，您可以使用第二个字典，如：

names = collections.Counter()
rows = {}

with open(infile) as input_file:
    for row in csv.reader(input_file, delimiter='\t'):
        rows[row[0]] = row
        names[row[0]] += 1

for name, count in names.iteritems():
    print rows[name], count

计算一个单词的出现次数

2 个答案: