我有一个csv文件包含以下矩阵
A1 A2 A3 A4
B1 0.2 0.3 0.7 .5
B2 0.5 0.55 0.4 0.6
B3 0.9 0.13 0.5 0.16
B4 0.2 0.4 0.6 0.8
我希望我的输出值格式大于0.5
A1 B2 B3
A2 B2
A3 B1 B3 B4
如下,请帮助我。
这是我尝试过的事情
import csv
ifile = open('gene.matrix.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print '%-8s: %s' % (header[colnum], col)
colnum += 1
rownum += 1
ifile.close()
答案 0 :(得分:1)
或者,如果您有pandas,则索引/列应该很容易获得:
In [2]: import pandas as pd
# df = pd.read_csv('gene.matrix.csv', delimiter='\s+')
In [3]: df = pd.read_clipboard() # from your sample
# simply do "df >= 0.5" can locate the values
# .T is just doing a transpose for the correct index/column you expect
# stack() to Pivot a level of the (possibly hierarchical) column labels
In [4]: groups = df[df >= 0.5].T.stack()
In [5]: groups
Out[5]:
A1 B2 0.50
B3 0.90
A2 B2 0.55
A3 B1 0.70
B3 0.50
B4 0.60
A4 B1 0.50
B2 0.60
B4 0.80
dtype: float64
获得所需输出的一种方法:
# store required output into a dict key/value list
In [6]: result = {}
In [7]: for i in groups.index:
...: if i[0] in result:
...: result[i[0]].append(i[1])
...: else:
...: result[i[0]] = [i[1]]
...:
In [8]: result
Out[8]:
{'A1': ['B2', 'B3'],
'A2': ['B2'],
'A3': ['B1', 'B3', 'B4'],
'A4': ['B1', 'B2', 'B4']}
# to print the expected output... note dict is unordered (you can use OrderedDict)
In [9]: for k, v in result.items():
...: print k, " ".join(v)
...:
A1 B2 B3
A3 B1 B3 B4
A2 B2
A4 B1 B2 B4
修改强>:
要逐行将结果写入文本文件,只需执行以下操作:
with open("output.csv", "w") as f:
for k, v in result.items():
f.write("%s %s\n" % (k, " ".join(v)))
在你的例子中,我可能有过复杂的事情,但这肯定是实现目标的一种方式。
答案 1 :(得分:0)
我会选择这样的东西:
reader = csv.DictReader(ifile)
k = 0.5
for r in reader:
ivd = {v: k for k, v in r.items()}
print [r['']] + [ivd[i] for i in r.values()[1:] if float(i)>k]