我想在Python中读取多个文件,以便在它们之间进行一些映射。
我对这些事情很陌生,所以我从别人那里得到了代码。但现在我想编辑它。我无法完全理解python宏。
所以这是代码
def getDataFromFile(infile):
'''
Opens a file, processes it by replacing all the \t\t
with \t'n/a'\t and returns to the user the header of the file,
and a list of genes.
'''
with open(infile, 'r') as f:
reader = csv.reader(f, delimiter='\t') # Open the file with csv.reader so it has a cleaner look to it.
header = f.readline() # Store header on a variable
list = [[x if x else 'n/a' for x in line] for line in reader] # This is done, so we can have 1 universal input. n/a is for non-existent value!
# Most databases, don't insert a special character for non-existent
# values, they just \t\t it! So be careful with that!
# With the above approach, we end up with a list of lists
# Every column, will have a value and that will be either the one provided by the file
# or, the "our" special for non-existent attributes, 'NaN'
header = header.split() # header should be a list of strings.
return header, geneList
如何修改此行list = [[x if x else 'n/a' for x in line] for line in reader]
,以便不仅检查'/t/t'
并将其替换为'n/a'
,还会查找其他形式的“不存在”,如{ {1}}(在R中使用)。
我知道这是一个 noob 问题,但我在2周前开始使用Python。我还在学习过程中。
答案 0 :(得分:1)
只需在listcomp中添加另一个测试:
list = [[x if (x and x not in ["NA","whatever"]) else 'n/a' for x in line] for line in reader]
这可以更清晰,就像倒置逻辑和在清单中整合空字符串一样。
list = [['n/a' if (x in ["", "NA","whatever"]) else x for x in line] for line in reader]