熊猫组混乱 - 不可动摇的类型

时间:2016-08-27 21:52:27

标签: python python-2.7 pandas dataframe group-by

按功能分组使用Pandas数据框,我希望按列c_b进行分组,并计算列c_a和列c_c的唯一计数。我的预期结果是,

预期结果

c_b,c_a_unique_count,c_c_unique_count
python,2,2
c++,2,2

遇到有关unhashable type的奇怪错误,有没有人有任何想法?感谢。

输入文件

c_a,c_b,c_c,c_d
hello,python,numpy,0.0
hi,python,pandas,1.0
ho,c++,vector,0.0
ho,c++,std,1.0
go,c++,std,0.0

源代码

sample = pd.read_csv('123.csv', header=None, skiprows=1,
    dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
sampleGroup = sample.groupby('c_b')
results = sampleGroup.count()[:,[0,2]]
results.to_csv(derivedFeatureFile, index= False)

错误消息

Traceback (most recent call last):
  File "/Users/foo/personal/featureExtraction/kaggleExercise.py", line 134, in <module>
    unitTest()
  File "/Users/foo/personal/featureExtraction/kaggleExercise.py", line 129, in unitTest
    results = sampleGroup.count()[:,[0,2]]
  File "/Users/foo/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 1997, in __getitem__
    return self._getitem_column(key)
  File "/Users/foo/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/foo/miniconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 1348, in _get_item_cache
    res = cache.get(item)
TypeError: unhashable type

1 个答案:

答案 0 :(得分:1)

对于每个组中唯一元素的数量,您可以使用:

df.groupby('c_b')['c_a', 'c_d'].agg(pd.Series.nunique)
df.groupby('c_b')['c_a', 'c_d'].agg(pd.Series.nunique)
Out: 
        c_a  c_d
c_b             
c++       2    2
python    2    2

df.groupby('c_b', as_index=False)['c_a', 'c_d'].agg(pd.Series.nunique)
Out: 
      c_b  c_a  c_d
0     c++    2    2
1  python    2    2