将字典转换为数据框

时间:2020-06-04 22:40:01

标签: python pandas dataframe dictionary

我有一本具有以下结构的字典:

{'OPPHJFPK_00001': ['K00879', 'PF00370.22'], 
'OPPHJFPK_00002': ['', 'PF01070.19', 'COG1304'], 
'OPPHJFPK_00003': ['', 'COG3279', 'GH65'], 
'OPPHJFPK_00004': ['', 'PF13460.7', 'COG0451'], 
'OPPHJFPK_00005': ['']}

我的目标是获得一个数据框,其中每个功能(始终以K,P,C或G开头)都在右列中:

| OPPHJFPK_00001 | K00879 | PF00370.22 |          |      |
| OPPHJFPK_00002 |        | PF01070.19 | COG1304  |      |
| OPPHJFPK_00003 |        |            | COG3279  | GH65 |
| OPPHJFPK_00004 |        | PF13460.7  |          |      |
| OPPHJFPK_00005 |        |            |          | GTA  |

我已经尝试过:

df = pd.DataFrame.from_dict(d, orient='index')

但是我得到的是未格式化的

| OPPHJFPK_00001 | K00879 | PF00370.22 |          |
| OPPHJFPK_00002 |        | PF01070.19 | COG1304  |
| OPPHJFPK_00003 |        | COG3279    | GH65     |
| OPPHJFPK_00004 |        | PF13460.7  |          |     
| OPPHJFPK_00005 |        | GTA        |          |

有没有熊猫功能可以解决这个问题?

请注意,第一列始终是正确的,因为在字典中缺少该功能时,在其位置为空字符串。对于其余选项,如果不存在,则字典中将没有任何内容。

关于如何解决此问题的任何想法?我会很感激的。

4 个答案:

答案 0 :(得分:1)

假设d是您的dict

s=pd.Series(d).explode()
s=s[s!='']
df=pd.crosstab(index=s.index,columns=s.str[0],values=s,aggfunc='first')
df
col_0                 C     G       K           P
row_0                                            
OPPHJFPK_00001      NaN   NaN  K00879  PF00370.22
OPPHJFPK_00002  COG1304   NaN     NaN  PF01070.19
OPPHJFPK_00003  COG3279  GH65     NaN         NaN
OPPHJFPK_00004  COG0451   NaN     NaN   PF13460.7

答案 1 :(得分:0)

尝试一下:

data = {'OPPHJFPK_00001': ['K00879', 'PF00370.22',''], 
'OPPHJFPK_00002': ['', 'PF01070.19', 'COG1304'], 
'OPPHJFPK_00003': ['', 'COG3279', 'GH65'], 
'OPPHJFPK_00004': ['', 'PF13460.7', 'COG0451'], 
'OPPHJFPK_00005': ['','','']}
pd.DataFrame.from_dict(data)

然后,您可以使用DataFrame.transpose()

来反转矩阵

答案 2 :(得分:0)

另一种解决方案是重塑字典:

a = {'OPPHJFPK_00001': ['K00879', 'PF00370.22'], 
'OPPHJFPK_00002': ['', 'PF01070.19', 'COG1304'], 
'OPPHJFPK_00003': ['', 'COG3279', 'GH65'], 
'OPPHJFPK_00004': ['', 'PF13460.7', 'COG0451'], 
'OPPHJFPK_00005': ['']}

# Reshape it so that each value is a duct of {letter: value}
a = {k: {x[0]: x for x in v if x} for k, v in a.items()}
# And then take care of those empty values
a = {k: v if v else {'K': float('nan')} for k, v in a.items()}

答案 3 :(得分:0)

要获得预期的输出,字典必须具有以下格式:

d = {'OPPHJFPK_00001': ['K00879', 'PF00370.22', '', ''], 
'OPPHJFPK_00002': ['', 'PF01070.19', 'COG1304', ''], 
'OPPHJFPK_00003': ['', '', 'COG3279', 'GH65'], 
'OPPHJFPK_00004': ['', 'PF13460.7', '', ''], 
'OPPHJFPK_00005': ['','','', 'GTA']}

df = pd.DataFrame.from_dict(d, orient='index')

您正在获得此格式,因为您的数组长度不同。