Question

我正在学习Python并创建了这个程序，但它不起作用，我希望有人能找到错误！

我有一个包含这样的条目的文件：

0 Kurthia sibirica Planococcaceae   
1593 Lactobacillus hordei Lactobacillaceae   
1121 Lactobacillus coleohominis Lactobacillaceae   
614 Lactobacillus coryniformis Lactobacillaceae   
57 Lactobacillus kitasatonis Lactobacillaceae   
3909 Lactobacillus malefermentans Lactobacillaceae

我的目标是删除所有以只在整个文件中出现一次的数字开头的行（唯一数字），并将以数字开头的所有行保存两次或更多次到新文件。这是我的尝试。它还没有工作（当我让print行工作时，整个文件中的一行重复了3次，就是这样）：

#!/usr/bin/env python

infilename = 'v35.clusternum.species.txt'
outfilename = 'v13clusters.no.singletons.txt'

#remove extra letters and spaces
x = 0
with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
        for line in infile:
                clu, gen, spec, fam = line.split()
        for clu in line:
                if clu.count > 1:
                        #print line
                        outfile.write(line)
                else:
                    x += 1
print("Number of Singletons:")
print(x)

感谢您的帮助！

Answer 1

好的，你的代码朝着正确的方向前进，但你有一些事情显然很混乱。

您需要将脚本正在执行的操作分为两个逻辑步骤：一，聚合（计数）所有clu字段。二，编写clu个＆gt;的每个字段你试图在同一时间一起做这些步骤......好吧，它没有用。从技术上讲，您可以这样做，但语法错误。连续搜索文件以获取内容也非常低效。最好只做一次或两次。

所以，让我们分开步骤。首先，计算您的clu字段。 collections模块有Counter您可以使用。

from collections import Counter
with open(infilename, 'r') as infile:
    c = Counter(line.split()[0] for line in infile)

c现在是Counter，可用于查找给定clu的计数。

with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
        for line in infile:
                clu, gen, spec, fam = line.split()
                if c[clu] > 1:
                    outfile.write(line)

删除以唯一编号开头的行

1 个答案: