如何将作为列表列表值的字典转换为python中的数据框?

时间:2018-09-05 08:21:53

标签: python list dictionary dataframe

我有一个像这样的字典,键作为“开始位置”,值作为条目列表,每个条目都包含多个其他值。

dict1 = {28878779: 
[[0.63078648931418,'BRCA','Primary Blood Derived Cancer','chr16'],
  [0.913319324289701, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.4291909025802871, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.7571498628201009, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.20053355013001398, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.47222708511173905, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.5421979810611359, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.517080694962231, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.354578922865826, 'BRCA', 'Primary Blood Derived Cancer', 'chr16'],
  [0.47933127476003706, 'BRCA', 'Primary Blood Derived Cancer', 'chr16']]
116276795: 
[[0.0295335249313507,'BRCA','Primary Blood Derived Cancer','chr12'],
  [0.0225709542480921, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0230930552162406, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0226794373583645, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0465238706721383, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0308525159082739, 'BRCA', 'Primary Blood Derived Cancer', 'chr12'],
  [0.0280263565564701, 'BRCA', 'Primary Blood Derived Cancer', 'chr12']]
...}

我想像这样将字典转换成数据框。一个包含字典键和值(每个值的输入)的数据框成数据框的行。

Start       Beta_value       Cancer            Stage             Chromosome
28878779  0.63078648931418   BRCA  Primary Blood Derived Cancer    chr16
28878779  0.913319324289701  BRCA  Primary Blood Derived Cancer    chr16
.
.
116276795 0.029533524931350  BRCA  Primary Blood Derived Cancer    chr12
116276795 0.0225709542480921 BRCA  Primary Blood Derived Cancer    chr12
.
.

我尝试过这个。

dlist = [[key,value[i][0],value[i][1],value[i][2],value[i][3]]
for key,value in dict1.items()
for i in value]


beta = pd.DataFrame(d, columns = 
['Start','Beta_value','Cancer','Stage','Chromosome'])

它显示一些类型错误:

   TypeError: list indices must be integers or slices, not list

我该怎么办?

1 个答案:

答案 0 :(得分:1)

变量i返回列表,因此需要为它们建立索引:

dlist = [[key,i[0],i[1],i[2],i[3]] for key,value in dict1.items() for i in value]

或将密钥添加到列表:

dlist = [[key] + i for key,value in dict1.items() for i in value] 
#alternative 
#dlist = [(key, *i) for key,value in dict1.items() for i in value]    

beta = pd.DataFrame(dlist, columns=['Start','Beta_value','Cancer','Stage','Chromosome'])
print (beta)
        Start  Beta_value Cancer                         Stage Chromosome
0    28878779    0.630786   BRCA  Primary Blood Derived Cancer      chr16
1    28878779    0.913319   BRCA  Primary Blood Derived Cancer      chr16
2    28878779    0.429191   BRCA  Primary Blood Derived Cancer      chr16
3    28878779    0.757150   BRCA  Primary Blood Derived Cancer      chr16
4    28878779    0.200534   BRCA  Primary Blood Derived Cancer      chr16
5    28878779    0.472227   BRCA  Primary Blood Derived Cancer      chr16
6    28878779    0.542198   BRCA  Primary Blood Derived Cancer      chr16
7    28878779    0.517081   BRCA  Primary Blood Derived Cancer      chr16
8    28878779    0.354579   BRCA  Primary Blood Derived Cancer      chr16
9    28878779    0.479331   BRCA  Primary Blood Derived Cancer      chr16
10  116276795    0.029534   BRCA  Primary Blood Derived Cancer      chr12
11  116276795    0.022571   BRCA  Primary Blood Derived Cancer      chr12
12  116276795    0.023093   BRCA  Primary Blood Derived Cancer      chr12
13  116276795    0.022679   BRCA  Primary Blood Derived Cancer      chr12
14  116276795    0.046524   BRCA  Primary Blood Derived Cancer      chr12
15  116276795    0.030853   BRCA  Primary Blood Derived Cancer      chr12
16  116276795    0.028026   BRCA  Primary Blood Derived Cancer      chr12