Question

我正在尝试按列获取唯一计数，但是我的数组具有分类变量（dtype对象）

val, count = np.unique(x, axis=1, return_counts=True)

尽管我遇到这样的错误：

TypeError: The axis argument to unique is not supported for dtype object

我该如何解决这个问题？

样品x：

array([[' Private', ' HS-grad', ' Divorced'],
       [' Private', ' 11th', ' Married-civ-spouse'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Married-civ-spouse'],
       [' Private', ' 9th', ' Married-spouse-absent'],
       [' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Never-married'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)

需要以下各项：

for x_T in x.T:
    val, count = np.unique(x_T, return_counts=True)
    print (val,count)


[' Private' ' Self-emp-not-inc'] [8 1]
[' 11th' ' 9th' ' Bachelors' ' HS-grad' ' Masters' ' Some-college'] [1 1 2 2 2 1]
[' Divorced' ' Married-civ-spouse' ' Married-spouse-absent'
 ' Never-married'] [1 6 1 1]

Answer 1

即使您的输出看起来不像您提供的期望计数，您仍然可以使用Itemfreq：

import numpy as np
from scipy.stats import itemfreq

x = np. array([[' Private', ' HS-grad', ' Divorced'],
       [' Private', ' 11th', ' Married-civ-spouse'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Married-civ-spouse'],
       [' Private', ' 9th', ' Married-spouse-absent'],
       [' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
       [' Private', ' Masters', ' Never-married'],
       [' Private', ' Bachelors', ' Married-civ-spouse'],
       [' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)

itemfreq(x)

输出：

array([[' 11th', 1],
       [' 9th', 1],
       [' Bachelors', 2],
       [' Divorced', 1],
       [' HS-grad', 2],
       [' Married-civ-spouse', 6],
       [' Married-spouse-absent', 1],
       [' Masters', 2],
       [' Never-married', 1],
       [' Private', 8],
       [' Self-emp-not-inc', 1],
       [' Some-college', 1]], dtype=object)

否则，您可以尝试指定另一个dtype，例如：

val, count = np.unique(x.astype("<U22"), axis=1, return_counts=True)

为此，但是您的数组必须不同

dtype对象不支持将怪异的axis参数

1 个答案: