我有几个制表符分隔文件,我想用csvDictreader读入dicts。在实际数据开始之前,每个文件都包含几个以“#”或“\ t”开头的注释行。注释行数因文件而异。我一直在尝试this post中概述的方法,但似乎无法使其正常工作。
这是我目前的代码:
def load_database_snps(inputFile):
'''This function takes a txt tab delimited input file (in house database) and returns a list of dictionaries for each variant'''
idStore = [] #empty list for storing variant records
with open(inputFile, 'r+') as varin:
idStoreDictgroup = csv.DictReader((row for row in varin if row.startswith('hr', 1, 2)),delimiter='\t') #create a generator; dictionary per snp (row) in the file
idStoreDictgroup.fieldnames = [field.strip() for field in idStoreDictgroup.fieldnames] #strip whitespace from field names
print(type(idStoreDictgroup))
for d in idStoreDictgroup: #iterate over dictionaries in varin_dictgroup
print(d)
idStore.append(d) #attach to var_list
return idStore
以下是输入文件的示例:
## SM=Sample,AD=Total Allele Depth, DP=Total Depth
## het;;; and homo;;; are breakdowns of variant read counts per sample - chr1:10002921 T>G AD=34 het:4;11;7;12 (sum=34)
Hetereozygous Homozygous
Chr Start End ref |A| |C| |G| |T| HetCount |A| |C| |G| |T| HomCount TotalCount SampleCount
chr1 10001102 10001102 T 0 0 SM=1;AD=22;DP=38 0 1 0 0 0 0 0 1 138 het:22; homo:-
chr1 10002921 10002921 T 0 0 SM=4;AD=34;DP=63 0 4 0 0 0 0 0 4 138 het:4;11;7;12; homo:-
我想读的所有行都以'Chr'或'chr'开头。我认为它不起作用,因为我需要迭代它以使用生成器重新格式化字段名称,在将行读入字典之前将其耗尽。
我得到的错误信息是:
Traceback (most recent call last): File "snp_freq_V1-1_export.py", line 99, in <module> snp_check_wrapper(inputargs.snpstocheck, inputargs.snp_database_location) File "snp_freq_V1-1_export.py", line 92, in snp_check_wrapper snpDatabase = load_database_snps(databaseInputFile) #store database variants in snp_database (a dictionary) File "snp_freq_V1-1_export.py", line 53, in load_database_snps idStoreDictgroup.fieldnames = [field.strip() for field in idStoreDictgroup.fieldnames] #strip whitespace from field names TypeError: 'NoneType' object is not iterable
我尝试过对当前代码进行反转,并明确排除以“#”和“\ t”开头的行。但这也行不通,只给了我一个空白的字典。
答案 0 :(得分:1)
你应该做的是跳过前面的所有行直到开始chr
的事情,例如:
import csv
from itertools import dropwhile
with open('somefile') as fin:
start = dropwhile(lambda L: not L.lower().lstrip().startswith('chr'), fin)
for row in csv.DictReader(start, delimiter='\t'):
# do something