我有一个dict
的一个dicts
,我正尝试将其变成一个Pandas
DataFrame
。 dict
的结构是映射到dict
的索引,该索引将列索引映射到它们的值,然后我希望DataFrame
中的其他所有内容都为0。例如:>
d = {0: {0:2, 2:5},
1: {1:1, 3:2},
2: {2:5}}
所以然后我希望DataFrame
看起来像
index c0 c1 c2 c3
0 2.0 NaN 5.0 NaN
1 NaN 1.0 NaN 2.0
2 NaN NaN 5.0 NaN
我目前正计划编写一个函数,该函数将从yield
的每个项目中d
个元组,并将其用作创建DataFrame
的可迭代项,但是我对是否有人感兴趣其他都做了类似的事情。
答案 0 :(得分:2)
只需简单调用DataFrame.from_dict
pd.DataFrame.from_dict(d,'index').sort_index(axis=1)
0 1 2 3
0 2.0 NaN 5.0 NaN
1 NaN 1.0 NaN 2.0
2 NaN NaN 5.0 NaN
答案 1 :(得分:2)
好吧,为什么不按常规方式进行处理和转置呢?
>>> pd.DataFrame(d).T
0 1 2 3
0 2.0 NaN 5.0 NaN
1 NaN 1.0 NaN 2.0
2 NaN NaN 5.0 NaN
>>>
答案 2 :(得分:0)
在对其他建议进行时间测试之后,我发现我原来的方法要快得多。我正在使用以下函数来制作迭代器,并将其传递给pd.DataFrame
def row_factory(index_data, row_len):
"""
Make a generator for iterating for index_data
Parameters:
index_data (dict): a dict mapping the a value to a dict of index mapped to values. All indexes not in
second dict are assumed to be None.
row_len (int): length of row
Example:
index_data = {0: {0:2, 2:1}, 1: {1:1}} would yield [0, 2, None, 1] then [1, None, 1, None]
"""
for key, data in index_data.items():
# Initialize row with the key starting, then None for each value
row = [key] + [None] * (row_len - 1)
for index, value in data.items():
# Only replace indexes that have a value
row[index] = value
yield row
df = pd.DataFrame(row_factory(d), 5)