Pandas Unhashable类型:' list'何时使用describe()

时间:2018-05-15 12:37:21

标签: python pandas

我有一个字典,其中有另一个字典作为其值,另一个字典有一个列表作为其值 例如,

{'A' : {'a' : ['1', '2', '3'], 'b' : ['4', '5'], 'c' : ['6']},  
'B' : {'a' : ['7'], 'b' : ['8', '9']}}

我想制作一个Pandas DataFrame,其中AB为索引,abc为列。

我做的是:

df = pd.DataFrame.from_dict(dictionary, orient='index')  
df.describe()

但是我说错了:

TypeError                                 Traceback (most recent call last)
<ipython-input-6-88dc07bc979e> in <module>()
      6 df = pd.DataFrame.from_dict(dict_data, orient='index')  
----> 7 df.describe() # print df  

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in  
 describe(self, percentiles, include, exclude)  
   6825             data = self.select_dtypes(include=include, exclude=exclude)  
   6826   
-> 6827         ldesc = [describe_1d(s) for _, s in data.iteritems()]  
   6828         # set a convenient order for rows  
   6829         names = []  

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in <listcomp>(.0)  
   6825             data = self.select_dtypes(include=include, exclude=exclude)
   6826 
-> 6827         ldesc = [describe_1d(s) for _, s in data.iteritems()]
   6828         # set a convenient order for rows
   6829         names = []

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in describe_1d(data)
   6808                 return describe_numeric_1d(data)
   6809             else:
-> 6810                 return describe_categorical_1d(data)
   6811 
   6812         if self.ndim == 1:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in describe_categorical_1d(data)
   6782         def describe_categorical_1d(data):
   6783             names = ['count', 'unique']
-> 6784             objcounts = data.value_counts()
   6785             count_unique = len(objcounts[objcounts != 0])
   6786             result = [data.count(), count_unique]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\base.py in value_counts(self, normalize, sort, ascending, bins, dropna)
    869         from pandas.core.algorithms import value_counts
    870         result = value_counts(self, sort=sort, ascending=ascending,
--> 871                               normalize=normalize, bins=bins, dropna=dropna)
    872         return result
    873 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\algorithms.py in value_counts(values, sort, ascending, normalize, bins, dropna)
    550 
    551         else:
--> 552             keys, counts = _value_counts_arraylike(values, dropna)
    553 
    554             if not isinstance(keys, Index):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\algorithms.py in _value_counts_arraylike(values, dropna)
    595         # TODO: handle uint8
    596         f = getattr(htable, "value_count_{dtype}".format(dtype=ndtype))
--> 597         keys, counts = f(values, dropna)
    598 
    599         mask = isna(values)

pandas/_libs/hashtable_func_helper.pxi in pandas._libs.hashtable.value_count_object()

pandas/_libs/hashtable_func_helper.pxi in pandas._libs.hashtable.value_count_object()

TypeError: unhashable type: 'list'

我该如何解决?

我想要像

这样的结果
    a           b       c
A   1   2   3   4   5   6
B   7           8   9

2 个答案:

答案 0 :(得分:3)

值将是列表对象。因此,您可以简单地使用DataFrame构造函数并进行转置。我提到列表对象的原因是因为我通常会避免构造和转置,因为它可能会混淆dtypes。但在这种情况下,dtyes无论如何都会成为对象。

d = {
    'A' : {'a' : ['1', '2', '3'], 'b' : ['4', '5'], 'c' : ['6']},
    'B' : {'a' : ['7'], 'b' : ['8', '9']}
}

pd.DataFrame(d).T

           a       b    c
A  [1, 2, 3]  [4, 5]  [6]
B        [7]  [8, 9]  NaN

然而,问题在于尝试描述列表。有什么意义?我想你想要描述列表中的数字。如果这是真的,我会像这样构建:

df = pd.DataFrame.from_dict({
    i: {(j, k): v for j, x in d_.items() for k, v in enumerate(x)}
    for i, d_ in d.items()
}, orient='index')

df

   a            b       c
   0    1    2  0  1    0
A  1    2    3  4  5    6
B  7  NaN  NaN  8  9  NaN

然后您可以描述

df.describe()

        a        b     c
        0  1  2  0  1  0
count   2  1  1  2  2  1
unique  2  1  1  2  2  1
top     1  2  3  8  5  6
freq    1  1  1  1  1  1

或者您可以堆叠得到的第二级列属性。

df.stack().describe()


        a  b  c
count   4  4  1
unique  4  4  1
top     1  8  6
freq    1  1  1

答案 1 :(得分:2)

由于错误说明了可靠性,我首先会将内部列表更改为元组:

d = {'A' : {'a' : ['1', '2', '3'], 'b' : ['4', '5'], 'c' : ['6']},
'B' : {'a' : ['7'], 'b' : ['8', '9']}}

flat = [(k, v.items()) for k,v in d.items()]
d2 = dict()
for k, kv2 in flat:
    dd_pairs = []
    for k2, v2 in kv2:
        dd_pairs.append( (k2,tuple(v2)) )
    d2[k] = dict(dd_pairs)

这应该解锁你...

此处描述了非常类似的问题:Pandas Multiindex from array => TypeError: unhashable type: 'dict'