Question

我正在创建一个脚本，将csv文件从列标题中读取到一组命名元组中。然后，我将使用这些命名元素来提取符合特定条件的数据行。

我已经计算出输入（如下所示），但是在将数据输出到另一个文件之前过滤数据时遇到了问题。

import csv
from collections import namedtuple

with open('test_data.csv') as f:
    f_csv = csv.reader(f) #read using csv.reader()
    Base = namedtuple('Base', next(f_csv)) #create namedtuple keys from header row
    for r in f_csv: #for each row in the file
        row = Base(*r) 
        # Process row
        print(row) #print data

我输入文件的内容如下：

Locus           Total_Depth     Average_Depth_sample    Depth_for_17
chr1:6484996    1030            1030                    1030
chr1:6484997    14              14                      14
chr1:6484998    0               0                       0

它们是从我的代码中打印出来的，如下所示：

Base（Locus ='chr1：6484996'，Total_Depth ='1030'， Average_Depth_sample ='1030'，Depth_for_17 ='1030'）基数（Locus ='chr1：6484997'，Total_Depth ='14'， Average_Depth_sample ='14'，Depth_for_17 ='14'）基数（Locus ='chr1：6484998'，Total_Depth ='0'，Average_Depth_sample ='0'， Depth_for_17 = '0'）

我希望能够只提取Total_Depth大于15的记录。

直观地，我尝试了以下功能：

if Base.Total_Depth >= 15 :
    print row

然而，这仅打印最后一行数据（来自上面的输出表）。我认为问题是双重的。据我所知，我没有将我的命名元组存储在任何地方，以便稍后引用它们。其次，数字是以字符串格式而不是整数读取的。

如果我需要将我的名字存储在某个地方，首先有人可以纠正我。

其次，如何将字符串值转换为整数？或者这是不可能的，因为命名元素是不可变的。

谢谢！

我previously asked a similar question关于字典，但现在想使用namedtuples。：）

Answer 1

在创建命名元组实例时将值映射到int：

row = Base(r[0], *map(int, r[1:]))

这会将r[0]值保留为字符串，并将其余值映射到int()。

这个要求知道CSV列，其中可以转换为整数的列在这里是硬编码的。

演示：

>>> from collections import namedtuple
>>> Base = namedtuple('Base', ['Locus', 'Total_Depth', 'Average_Depth_sample', 'Depth_for_17'])
>>> r = ['chr1:6484996', '1030', '1030', '1030']
>>> Base(r[0], *map(int, r[1:]))
Base(Locus='chr1:6484996', Total_Depth=1030, Average_Depth_sample=1030, Depth_for_17=1030)

请注意，您应该针对行进行测试，而不是Base类：

if row.Total_Depth >= 15:

在循环内，或在收集行的新循环中。

将命名元组的值从字符串转换为整数

1 个答案: