Question

我有一个数据文件，其中tab分为三个数据列（以及一些重复的标题行）：

Sequence ../Output/yy\Programs\NP_416485.4 alignment. Using default output format...

#jjhgjg
0   0.89    u-p
1   -5.79   --- 
2   0.85    yui
3   0.51    uio
4   0.66    -08
Sequence ../Output/yy\Programs\YP_986467.7 alignment. Using default output format...

#gfhhjgjhg
0   0.001   -s-
1   0.984   ---
2   0.564   -fg
3   0.897   -sr

从第二个数据列开始，对于那些大于0.5的值，我想提取相应的第一个列号（或范围）。

对于上面的输入，输出将是：

NP_416485.4: 1, 3-5
YP_986467.7: 2-4

这里，“NP_416485.4”和“YP_986467.7”来自头描述符（在\ Programs之后）。（注意，例如，“NP_416485.4”的实际值应为“NP_416485.4：0,2-4”，但我将所有值增加为+1，因为我不想以0开头）。正如所建议的，首先，我使用了python csv模块：

import csv

with open('test.txt','rb') as tsvin, open('new.csv', 'wb') as csvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
 csvout = csv.writer(csvout)

    for row in tsvin:
        count = float(row[1])
        if count > 0.5:
            csvout.writerow(row)

但它给出了：

count = float(row[1])
ValueError: could not convert string to float:

请帮忙。感谢。

使用python

0 个答案: