我有这个文件:
GSENumber Species Platform Sample Age Tissue Sex Count
GSE11097 Rat GPL1355 GSM280267 4 Liver Male Count
GSE11097 Rat GPL1355 GSM280268 4 Liver Female Count
GSE11097 Rat GPL1355 GSM280269 6 Liver Male Count
GSE11097 Rat GPL1355 GSM280409 6 Liver Female Count
GSE11291 Mouse GPL1261 GSM284967 5 Heart Male Count
GSE11291 Mouse GPL1261 GSM284968 5 Heart Male Count
GSE11291 Mouse GPL1261 GSM284969 5 Heart Male Count
GSE11291 Mouse GPL1261 GSM284970 5 Heart Male Count
GSE11291 Mouse GPL1261 GSM284975 10 Heart Male Count
GSE11291 Mouse GPL1261 GSM284976 10 Heart Male Count
GSE11291 Mouse GPL1261 GSM284987 5 Muscle Male Count
GSE11291 Mouse GPL1261 GSM284988 5 Muscle Female Count
GSE11291 Mouse GPL1261 GSM284989 30 Muscle Male Count
GSE11291 Mouse GPL1261 GSM284990 30 Muscle Male Count
GSE11291 Mouse GPL1261 GSM284991 30 Muscle Male Count
你可以看到这里有两个系列(GSE11097和GSE11291),我想要每个系列的摘要;对于每个“GSE”号码,输出应该是这样的字典:
Series Species Platform AgeRange Tissue Sex Count
GSE11097 Rat GPL1355 4-6 Liver Mixed Count
GSE11291 Mouse GPL1261 5-10 Heart Male Count
GSE11291 Mouse GPL1261 5-30 Muscle Mixed Count
所以我知道一种方法是:
e.g。
import sys
list_of_series = list(set([line.strip().split()[0] for line in open(sys.argv[1])]))
list_of_dicts = []
for each_list in list_of_series:
temp_dict={"species":"","platform":"","age":[],"tissue":"","Sex":[],"Count":""}
for line in open(sys.argv[1]).readlines()[1:]:
line = line.strip().split()
if line[0] == each_list:
temp_dict["species"] = line[1]
temp_dict["platform"] = line[2]
temp_dict["age"].append(line[4])
temp_dict["tissue"] = line[5]
temp_dict["sex"].append(line[6])
temp_dict["count"] = line[7]
我认为这有两个方面很混乱:
我要在整个文件中读两次(实际上,文件比例子大得多)
此方法会使用相同的单词重写相同的词典条目。
此外,性别存在问题,我想说“如果男性和女性都混淆”,或者说“男性”或“女性”。
我可以使这段代码工作,但我想知道快速提示使代码更清洁/更pythonic?
答案 0 :(得分:0)
我同意Max Paymar的说法,这应该用查询语言来完成。如果你真的想用Python做,那么pandas模块将会有很多帮助。
var text = file_open.getBlob().getDataAsString('utf8');
这产生了你要求的结果,并且比用纯Python解析文件更清晰。
答案 1 :(得分:0)
public Object method()
{
Object objects[] = { a, b, c, d }; // Assuming objects a, b, c and d exist...
boolean condition1;
boolean condition2;
/*
* Truth Table
*
* condtion1 condition2 Object
* false false d
* false true c
* true false b
* true true a
*/
int selector = (condition1 ? 0 : 1) + (condition2 ? 0 : 2);
return objects[selector];
}