我有一个字典,其中有另一个字典作为其值,另一个字典有一个列表作为其值 例如,
{'A' : {'a' : ['1', '2', '3'], 'b' : ['4', '5'], 'c' : ['6']},
'B' : {'a' : ['7'], 'b' : ['8', '9']}}
我想制作一个Pandas DataFrame,其中A
,B
为索引,a
,b
,c
为列。
我做的是:
df = pd.DataFrame.from_dict(dictionary, orient='index')
df.describe()
但是我说错了:
TypeError Traceback (most recent call last)
<ipython-input-6-88dc07bc979e> in <module>()
6 df = pd.DataFrame.from_dict(dict_data, orient='index')
----> 7 df.describe() # print df
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in
describe(self, percentiles, include, exclude)
6825 data = self.select_dtypes(include=include, exclude=exclude)
6826
-> 6827 ldesc = [describe_1d(s) for _, s in data.iteritems()]
6828 # set a convenient order for rows
6829 names = []
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in <listcomp>(.0)
6825 data = self.select_dtypes(include=include, exclude=exclude)
6826
-> 6827 ldesc = [describe_1d(s) for _, s in data.iteritems()]
6828 # set a convenient order for rows
6829 names = []
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in describe_1d(data)
6808 return describe_numeric_1d(data)
6809 else:
-> 6810 return describe_categorical_1d(data)
6811
6812 if self.ndim == 1:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in describe_categorical_1d(data)
6782 def describe_categorical_1d(data):
6783 names = ['count', 'unique']
-> 6784 objcounts = data.value_counts()
6785 count_unique = len(objcounts[objcounts != 0])
6786 result = [data.count(), count_unique]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\base.py in value_counts(self, normalize, sort, ascending, bins, dropna)
869 from pandas.core.algorithms import value_counts
870 result = value_counts(self, sort=sort, ascending=ascending,
--> 871 normalize=normalize, bins=bins, dropna=dropna)
872 return result
873
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\algorithms.py in value_counts(values, sort, ascending, normalize, bins, dropna)
550
551 else:
--> 552 keys, counts = _value_counts_arraylike(values, dropna)
553
554 if not isinstance(keys, Index):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\algorithms.py in _value_counts_arraylike(values, dropna)
595 # TODO: handle uint8
596 f = getattr(htable, "value_count_{dtype}".format(dtype=ndtype))
--> 597 keys, counts = f(values, dropna)
598
599 mask = isna(values)
pandas/_libs/hashtable_func_helper.pxi in pandas._libs.hashtable.value_count_object()
pandas/_libs/hashtable_func_helper.pxi in pandas._libs.hashtable.value_count_object()
TypeError: unhashable type: 'list'
我该如何解决?
我想要像
这样的结果 a b c
A 1 2 3 4 5 6
B 7 8 9
答案 0 :(得分:3)
值将是列表对象。因此,您可以简单地使用DataFrame
构造函数并进行转置。我提到列表对象的原因是因为我通常会避免构造和转置,因为它可能会混淆dtypes。但在这种情况下,dtyes无论如何都会成为对象。
d = {
'A' : {'a' : ['1', '2', '3'], 'b' : ['4', '5'], 'c' : ['6']},
'B' : {'a' : ['7'], 'b' : ['8', '9']}
}
pd.DataFrame(d).T
a b c
A [1, 2, 3] [4, 5] [6]
B [7] [8, 9] NaN
然而,问题在于尝试描述列表。有什么意义?我想你想要描述列表中的数字。如果这是真的,我会像这样构建:
df = pd.DataFrame.from_dict({
i: {(j, k): v for j, x in d_.items() for k, v in enumerate(x)}
for i, d_ in d.items()
}, orient='index')
df
a b c
0 1 2 0 1 0
A 1 2 3 4 5 6
B 7 NaN NaN 8 9 NaN
然后您可以描述
df.describe()
a b c
0 1 2 0 1 0
count 2 1 1 2 2 1
unique 2 1 1 2 2 1
top 1 2 3 8 5 6
freq 1 1 1 1 1 1
或者您可以堆叠得到的第二级列属性。
df.stack().describe()
a b c
count 4 4 1
unique 4 4 1
top 1 8 6
freq 1 1 1
答案 1 :(得分:2)
由于错误说明了可靠性,我首先会将内部列表更改为元组:
d = {'A' : {'a' : ['1', '2', '3'], 'b' : ['4', '5'], 'c' : ['6']},
'B' : {'a' : ['7'], 'b' : ['8', '9']}}
flat = [(k, v.items()) for k,v in d.items()]
d2 = dict()
for k, kv2 in flat:
dd_pairs = []
for k2, v2 in kv2:
dd_pairs.append( (k2,tuple(v2)) )
d2[k] = dict(dd_pairs)
这应该解锁你...
此处描述了非常类似的问题:Pandas Multiindex from array => TypeError: unhashable type: 'dict'