我有一个dict,它在不同的时间滞后上保存计算值,这意味着它们在不同的日期开始。例如,我的数据可能如下所示:
Date col1 col2 col3 col4 col5
01-01-15 5 12 1 -15 10
01-02-15 7 0 9 11 7
01-03-15 6 1 2 18
01-04-15 9 8 10
01-05-15 -4 7
01-06-15 -11 -1
01-07-15 6
每个标题都是键,每列值都是每个键的值(我为此使用了defaultdict(list)
)。当我尝试运行pd.DataFrame.from_dict(d)
时,我可以理解地得到一个错误,指出所有数组的长度必须相同。是否有一种简单/无关紧要的方法来填充或填充数字,以便输出最终成为以下数据帧?
Date col1 col2 col3 col4 col5
01-01-15 5 12 1 -15 10
01-02-15 7 0 9 11 7
01-03-15 NaN 6 1 2 18
01-04-15 NaN 9 8 10 NaN
01-05-15 NaN -4 NaN 7 NaN
01-06-15 NaN -11 NaN -1 NaN
01-07-15 NaN 6 NaN NaN NaN
或者我是否必须手动对每个列表执行此操作?
以下是重新创建字典的代码:
import pandas as pd
from collections import defaultdict
d = defaultdict(list)
d["Date"].extend([
"01-01-15",
"01-02-15",
"01-03-15",
"01-04-15",
"01-05-15",
"01-06-15",
"01-07-15"
]
d["col1"].extend([5, 7])
d["col2"].extend([12, 0, 6, 9, -4, -11, 6])
d["col3"].extend([1, 9, 1, 8])
d["col4"].extend([-15, 11, 2, 10, 7, -1])
d["col5"].extend([10, 7, 18])
答案 0 :(得分:12)
另一种选择是将from_dict
与orient='index'
一起使用,然后进行转置:
my_dict = {'a' : [1, 2, 3, 4, 5], 'b': [1, 2, 3]}
df = pd.DataFrame.from_dict(my_dict, orient='index').T
请注意,如果您的列具有不同的类型,则可能会遇到dtype
的问题,例如漂浮在一列中,字符串在另一列中。
结果输出:
a b
0 1.0 1.0
1 2.0 2.0
2 3.0 3.0
3 4.0 NaN
4 5.0 NaN
答案 1 :(得分:5)
#dictionary of different lengths...
my_dict = {'a' : [1, 2, 3, 4, 5], 'b': [1, 2, 3]}
pd.DataFrame(dict([(col_name,pd.Series(values)) for col_name,values in my_dict.items() ]))
输出 -
a b
0 1 1.0
1 2 2.0
2 3 3.0
3 4 NaN
4 5 NaN
答案 2 :(得分:5)
这是一种使用遮蔽的方法 -
K = d.keys()
V = d.values()
mask = ~np.in1d(K,'Date')
K1 = [K[i] for i,item in enumerate(V) if mask[i]]
V1 = [V[i] for i,item in enumerate(V) if mask[i]]
lens = np.array([len(item) for item in V1])
mask = lens[:,None] > np.arange(lens.max())
out_arr = np.full(mask.shape,np.nan)
out_arr[mask] = np.concatenate(V1)
df = pd.DataFrame(out_arr.T,columns=K1,index=d['Date'])
示例运行 -
In [612]: d.keys()
Out[612]: ['col4', 'col5', 'col2', 'col3', 'col1', 'Date']
In [613]: d.values()
Out[613]:
[[-15, 11, 2, 10, 7, -1],
[10, 7, 18],
[12, 0, 6, 9, -4, -11, 6],
[1, 9, 1, 8],
[5, 7],
['01-01-15',
'01-02-15',
'01-03-15',
'01-04-15',
'01-05-15',
'01-06-15',
'01-07-15']]
In [614]: df
Out[614]:
col4 col5 col2 col3 col1
01-01-15 -15 10 12 1 5
01-02-15 11 7 0 9 7
01-03-15 2 18 6 1 NaN
01-04-15 10 NaN 9 8 NaN
01-05-15 7 NaN -4 NaN NaN
01-06-15 -1 NaN -11 NaN NaN
01-07-15 NaN NaN 6 NaN NaN
答案 3 :(得分:5)
使用itertools(Python 3):
import itertools
pd.DataFrame(list(itertools.zip_longest(*d.values())), columns=d.keys()).sort_index(axis=1)
Out[728]:
col1 col2 col3 col4 col5
0 5.0 12 1.0 -15.0 10.0
1 7.0 0 9.0 11.0 7.0
2 NaN 6 1.0 2.0 18.0
3 NaN 9 8.0 10.0 NaN
4 NaN -4 NaN 7.0 NaN
5 NaN -11 NaN -1.0 NaN
6 NaN 6 NaN NaN NaN