问题涉及“计算行”的意思。'下面的代码块。这段代码最初编写时考虑了六个样本,我试图将其扩展为n个样本。
每个csv文件都是一个单独的患者文件,位于:
| gene | expression |
| --- | --- |
| A1BG | 1.444 |
| A1CF | 4.303 |
| A2BP1 | 11.117 |
原始文件列表已更改为接受要扩展的命令行参数,但我不知道接下来要继续。我需要提取每个样本名称并在该代码块中使用它,同时还在每个单独的列表理解中正确递增切片表示法。有什么想法吗?
import csv
import matplotlib.pyplot as plt
import sys
"""
This is an implementation of quantile normalization for microarray data analysis.
"""
# Parse csv files for samples, creating lists of gene names and expression values.
#file_list = ['genes1.csv', 'genes2.csv', 'genes3.csv', 'genes4.csv', 'genes5.csv',
# 'genes6.csv']
while True:
if (len(sys.argv) > 1):
file_list = [args for args in sys.argv[1:]]
print file_list
break
else:
print "Not enough arguments given."
break
set_dict = {}
for path in file_list:
with open(path) as stream:
data = list(csv.reader(stream, delimiter = '\t'))
data = sorted([(i, float(j)) for i, j in data], key = lambda v: v[1])
sample_genes = [i for i, j in data]
sample_values = [j for i, j in data]
set_dict[path] = (sample_genes, sample_values)
# Create sorted list of genes and values for all datasets.
set_list = [x for x in set_dict.items()]
set_list.sort(key = lambda (x,y): file_list.index(x))
这是需要缩放以处理CLI中作为参数提供的任意数量样本的代码块:
# Compute row means.
mean_values = [((a + b + c + d + e + f)/len(file_list))
for i, (a, b, c, d, e, f) in
enumerate(zip([v for i, (j, k) in set_list[:1] for v in k],
[v for i, (j, k) in set_list[1:2] for v in k],
[v for i, (j, k) in set_list[2:3] for v in k],
[v for i, (j, k) in set_list[3:4] for v in k],
[v for i, (j, k) in set_list[4:5] for v in k],
[v for i, (j, k) in set_list[5:6] for v in k]))]
以下由@ Bo102010给出的更正解决方案:
L = len(file_list)
all_sets = [set_list[i - 1: i] for i in range(1, L + 1)]
all_values = [[v for i, (j, k) in A for v in k] for A in all_sets]
mean_values = [sum(p) / L for p in zip(*all_values)]
答案 0 :(得分:1)
如果我已正确理解您的代码块,那么您应该能够使用“星号表示法”来解压缩迭代。在通话zip中使用它。
L = len(file_list)
all_sets = [set_list[i - 1: i] for i in range(1, L + 1)]
all_values = [[v for i, (j, k) in A for v in k] for A in all_sets]
mean_values = [sum(p) / L for p in zip(*all_values)]