我有一个python字典,其单个或多个整数值为字符串,如:
d = {'a': ['1.20', '1', '1.10'], 'b': ['5.800', '1', '2.000'], 'c': ['9.5000', '0.9000'], 'h': ['1.90000', '6.100000'], 'l': ['1.0000', '8.00000'], 'o': '5.0000', 'p': ['3.00', '1.1000'], 'v': ['1.8', '0.0000']}
如何在不使用pandas系列的情况下将其转换为pandas DataFrame ?
预期产出:
col1 col2 col3
a 1.2 1 1.1
b 5.8 1 2
c 9.5 0.9 NaN
h 1.9 6.1 NaN
l 1 8 NaN
o 5 NaN NaN
p 3 1.1 NaN
v 1.8 0 NaN
答案 0 :(得分:4)
使用助手Series
:
df = pd.concat({k:pd.Series(v) for k, v in d.items()}).unstack().astype(float).sort_index()
df.columns = 'col1 col2 col3'.split()
另一种解决方案是将非列表值转换为一个元素列表,然后转换为DataFrame.from_dict
:
d = {k:v if isinstance(v, list) else [v] for k, v in d.items()}
df = pd.DataFrame.from_dict(d, orient='index').astype(float).sort_index()
df.columns = 'col1 col2 col3'.split()
print (df)
col1 col2 col3
a 1.2 1.0 1.1
b 5.8 1.0 2.0
c 9.5 0.9 NaN
h 1.9 6.1 NaN
l 1.0 8.0 NaN
o 5.0 NaN NaN
p 3.0 1.1 NaN
v 1.8 0.0 NaN
答案 1 :(得分:1)
这是一种方式:
from collections import OrderedDict
import pandas as pd, numpy as np
d = {'a': ['1.20', '1', '1.10'], 'b': ['5.800', '1', '2.000'],
'c': ['9.5000', '0.9000'], 'h': ['1.90000', '6.100000'],
'l': ['1.0000', '8.00000'], 'o': '5.0000', 'p': ['3.00', '1.1000'],
'v': ['1.8', '0.0000']}
# convert to numeric
for k, v in d.items():
lst = list(map(float, v)) if isinstance(v, list) else [float(v)]
lst += [np.nan] * (3 - len(lst))
d[k] = lst
# sort dictionary by key & create cols
d = OrderedDict(sorted(d.items()))
cols = list(zip(*d.values()))
# build dataframe
df = pd.DataFrame.from_dict(d).T
# 0 1 2
# a 1.2 1.0 1.1
# b 5.8 1.0 2.0
# c 9.5 0.9 NaN
# h 1.9 6.1 NaN
# l 1.0 8.0 NaN
# o 5.0 NaN NaN
# p 3.0 1.1 NaN
# v 1.8 0.0 NaN
答案 2 :(得分:0)
尝试
df = pd.Series(d).apply(pd.Series).rename(columns=lambda col: 'col{}'.format(col+1))
输出将是
col1 col2 col3
a 1.20 1 1.10
b 5.800 1 2.000
c 9.5000 0.9000 NaN
h 1.90000 6.100000 NaN
l 1.0000 8.00000 NaN
o 5.0000 NaN NaN
p 3.00 1.1000 NaN
v 1.8 0.0000 NaN
没有pd.Series
df = pd.DataFrame(list(map(lambda v: [v] if type(v)!=list else v,d.values())
),index=d.keys(),columns=['col{}'.format(col+1) for col in range(3)])
答案 3 :(得分:0)
你也可能想要首先将你的dict的所有值填充到长度为3的数组
padded_d = {k : list(v) + [None] * (3 - len(v)) for k,v in d.items()}
然后使用.from_dict()
pd.DataFrame()
>>> pd.DataFrame.from_dict(padded_d, orient="index")
0 1 2
a 1.20 1 1.10
b 5.800 1 2.000
c 9.5000 0.9000 None
h 1.90000 6.100000 None
l 1.0000 8.00000 None
p 3.00 1.1000 None
v 1.8 0.0000 None
要处理您的输入中的密钥'o': '5.0000'
(我们希望'o' : ['5.0000']
- 不确定这是否是拼写错误)的格式错误的值,您应该检查类型...尽管如此可能更干净
def type_check(s):
if isinstance(s, str):
return [s]
else:
return s
padded_d = {k : type_check(v) + [None] * (3 - len(v)) for k,v in d.items()}
pd.DataFrame.from_dict(padded_d, orient="index")