我有一个像这样的字典,键作为“开始位置”,值作为条目列表,每个条目都包含多个其他值。
dict1 = {28878779:
[[0.63078648931418,'BRCA','Primary Blood Derived Cancer','chr16'],
[0.913319324289701, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
[0.4291909025802871, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
[0.7571498628201009, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
[0.20053355013001398, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
[0.47222708511173905, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
[0.5421979810611359, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
[0.517080694962231, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
[0.354578922865826, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
[0.47933127476003706, 'BRCA', 'Primary Blood Derived Cancer', 'chr16']]
116276795:
[[0.0295335249313507,'BRCA','Primary Blood Derived Cancer','chr12'],
[0.0225709542480921, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
[0.0230930552162406, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
[0.0226794373583645, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
[0.0465238706721383, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
[0.0308525159082739, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
[0.0280263565564701, 'BRCA', 'Primary Blood Derived Cancer', 'chr12']]
...}
我想像这样将字典转换成数据框。一个包含字典键和值(每个值的输入)的数据框成数据框的行。
Start Beta_value Cancer Stage Chromosome
28878779 0.63078648931418 BRCA Primary Blood Derived Cancer chr16
28878779 0.913319324289701 BRCA Primary Blood Derived Cancer chr16
.
.
116276795 0.029533524931350 BRCA Primary Blood Derived Cancer chr12
116276795 0.0225709542480921 BRCA Primary Blood Derived Cancer chr12
.
.
我尝试过这个。
dlist = [[key,value[i][0],value[i][1],value[i][2],value[i][3]]
for key,value in dict1.items()
for i in value]
beta = pd.DataFrame(d, columns =
['Start','Beta_value','Cancer','Stage','Chromosome'])
它显示一些类型错误:
TypeError: list indices must be integers or slices, not list
我该怎么办?
答案 0 :(得分:1)
变量i
返回列表,因此需要为它们建立索引:
dlist = [[key,i[0],i[1],i[2],i[3]] for key,value in dict1.items() for i in value]
或将密钥添加到列表:
dlist = [[key] + i for key,value in dict1.items() for i in value]
#alternative
#dlist = [(key, *i) for key,value in dict1.items() for i in value]
beta = pd.DataFrame(dlist, columns=['Start','Beta_value','Cancer','Stage','Chromosome'])
print (beta)
Start Beta_value Cancer Stage Chromosome
0 28878779 0.630786 BRCA Primary Blood Derived Cancer chr16
1 28878779 0.913319 BRCA Primary Blood Derived Cancer chr16
2 28878779 0.429191 BRCA Primary Blood Derived Cancer chr16
3 28878779 0.757150 BRCA Primary Blood Derived Cancer chr16
4 28878779 0.200534 BRCA Primary Blood Derived Cancer chr16
5 28878779 0.472227 BRCA Primary Blood Derived Cancer chr16
6 28878779 0.542198 BRCA Primary Blood Derived Cancer chr16
7 28878779 0.517081 BRCA Primary Blood Derived Cancer chr16
8 28878779 0.354579 BRCA Primary Blood Derived Cancer chr16
9 28878779 0.479331 BRCA Primary Blood Derived Cancer chr16
10 116276795 0.029534 BRCA Primary Blood Derived Cancer chr12
11 116276795 0.022571 BRCA Primary Blood Derived Cancer chr12
12 116276795 0.023093 BRCA Primary Blood Derived Cancer chr12
13 116276795 0.022679 BRCA Primary Blood Derived Cancer chr12
14 116276795 0.046524 BRCA Primary Blood Derived Cancer chr12
15 116276795 0.030853 BRCA Primary Blood Derived Cancer chr12
16 116276795 0.028026 BRCA Primary Blood Derived Cancer chr12