我有一个稀疏的数据帧sdf
,其中主要包含NaN
。当我使用sdf.to_dict()
时,它将输出该矩阵的密集版本,其中所有null
值均已填充。我该如何省略那些NaN
条目,而只有输出条目确实对dict有价值?
例如,sdf
是:
2018-02-02 2018-02-03
23:58:36 NaN NaN
23:58:37 1.0 NaN
23:58:40 NaN NaN
23:58:41 NaN NaN
23:58:42 NaN NaN
23:58:43 NaN NaN
23:58:48 NaN NaN
23:58:49 NaN NaN
23:58:50 NaN NaN
23:58:52 NaN 1.0
23:58:59 NaN NaN
23:59:00 NaN NaN
23:59:01 NaN NaN
23:59:05 NaN NaN
23:59:07 NaN NaN
stf.to_dict()
会给出:
{'2018-02-02': {'23:58:36': nan, '23:58:37': 1.0, '23:58:40':
nan, '23:58:41': nan, '23:58:42': nan, '23:58:43': nan,
'23:58:48': nan, '23:58:49': nan, '23:58:50': nan, '23:58:52':
nan, '23:58:59': nan, '23:59:00': nan, '23:59:01': nan,
'23:59:05': nan, '23:59:07': nan}, '2018-02-03': {'23:58:36':
nan, '23:58:37': nan, '23:58:40': nan, '23:58:41': nan,
'23:58:42': nan, '23:58:43': nan, '23:58:48': nan, '23:58:49':
nan, '23:58:50': nan, '23:58:52': 1.0, '23:58:59': nan,
'23:59:00': nan, '23:59:01': nan, '23:59:05': nan, '23:59:07':
nan}}
即使sdf
是一个稀疏的数据帧。
很抱歉含糊。我要保留所有非NaN
条目。所需的输出是
{'2018-02-02': {'23:58:37': 1.0}, '2018-02-03': {'23:58:52': 1.0}}
答案 0 :(得分:1)
改编this答案将完全满足您的要求
from math import isnan
sdd = sdf.dropna(how = 'all').to_dict()
clean_dict = {k: {j: sdd[k][j] for j in sdd[k] if not isnan(sdd[k][j])} for k in sdd}
答案 1 :(得分:1)
将stack
与dict comprehension
一起使用:
from collections import defaultdict
d = defaultdict(dict)
for (k1, k2), v in df.stack().items():
d[k2][k1] = v
d1 = dict(d)
如果输入是Series
和DatetimeIndex
:
print (s)
2018-02-02 23:58:37 1.0
2018-02-03 23:58:52 1.0
dtype: float64
from collections import defaultdict
d = defaultdict(dict)
for k, v in df.stack().items():
d[k.strftime('%Y-%m-%d')][k.strftime('%H:%M:%S')] = v
d1 = dict(d)
答案 2 :(得分:0)
到目前为止,对我来说这是最好的方法。
from pandas import isnull
[{k:i for k, i in row.iteritems() if not isnull(i)} for c, row in df.iterrows()]