dtype对象不支持将怪异的axis参数

时间:2018-10-27 09:42:44

标签: python python-3.x numpy unique frequency

我正在尝试按列获取唯一计数,但是我的数组具有分类变量(dtype对象)

val, count = np.unique(x, axis=1, return_counts=True)

尽管我遇到这样的错误:

TypeError: The axis argument to unique is not supported for dtype object

我该如何解决这个问题?

样品x:

array([[' Private', ' HS-grad', ' Divorced'],
       [' Private', ' 11th', ' Married-civ-spouse'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Married-civ-spouse'],
       [' Private', ' 9th', ' Married-spouse-absent'],
       [' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Never-married'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)

需要以下各项:

for x_T in x.T:
    val, count = np.unique(x_T, return_counts=True)
    print (val,count)


[' Private' ' Self-emp-not-inc'] [8 1]
[' 11th' ' 9th' ' Bachelors' ' HS-grad' ' Masters' ' Some-college'] [1 1 2 2 2 1]
[' Divorced' ' Married-civ-spouse' ' Married-spouse-absent'
 ' Never-married'] [1 6 1 1]

1 个答案:

答案 0 :(得分:1)

即使您的输出看起来不像您提供的期望计数,您仍然可以使用Itemfreq:

import numpy as np
from scipy.stats import itemfreq

x = np. array([[' Private', ' HS-grad', ' Divorced'],
       [' Private', ' 11th', ' Married-civ-spouse'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Married-civ-spouse'],
       [' Private', ' 9th', ' Married-spouse-absent'],
       [' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Never-married'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)

itemfreq(x)

输出:

array([[' 11th', 1],
       [' 9th', 1],
       [' Bachelors', 2],
       [' Divorced', 1],
       [' HS-grad', 2],
       [' Married-civ-spouse', 6],
       [' Married-spouse-absent', 1],
       [' Masters', 2],
       [' Never-married', 1],
       [' Private', 8],
       [' Self-emp-not-inc', 1],
       [' Some-college', 1]], dtype=object)

否则,您可以尝试指定另一个dtype,例如:

val, count = np.unique(x.astype("<U22"), axis=1, return_counts=True)

为此,但是您的数组必须不同