计算数据帧pandas

时间:2017-12-26 15:57:44

标签: python pandas dataframe

我需要计算数据帧pandas中唯一行的数量。我尝试这个解决方案:pandas - number of unique rows occurrences in dataframe但它会产生错误。

这是我尝试的代码:

import pandas as pd

df = {'x1': ['A','B','A','A','B','A','A','A'], 'x2': [1,3,2,2,3,1,2,3]}
df = pd.DataFrame(df)

print df.groupby(['x1','x2'], as_index=False).count()

这是错误:

Traceback (most recent call last):
  File "/home/user/workspace/project/test.py", line 9, in <module>
    print df.groupby(['x1','x2'], as_index=False).count()
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 4372, in count
    return self._wrap_agged_blocks(data.items, list(blk))
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 4274, in _wrap_agged_blocks
    index = np.arange(blocks[0].values.shape[1])
IndexError: list index out of range

我做错了什么?

2 个答案:

答案 0 :(得分:2)

使用~/.vimrc(ps:你最后可以添加size

.reset_index()

或修复您的代码

df.groupby(['x1','x2'], as_index=False).size()
Out[1262]: 
x1  x2
A   1     2
    2     3
    3     1
B   3     2
dtype: int64

如果您想了解唯一群组,可以使用df.groupby(['x1','x2'])['x2'].count() Out[1264]: x1 x2 A 1 2 2 3 3 1 B 3 2 Name: x2, dtype: int64

ngroups

答案 1 :(得分:1)

您可以删除重复项:

import pandas as pd

df = {'x1': ['A','B','A','A','B','A','A','A'], 'x2': [1,3,2,2,3,1,2,3]}
df = pd.DataFrame(df)

print(len(df.drop_duplicates()))

返回

4