我正在尝试按列获取唯一计数,但是我的数组具有分类变量(dtype对象)
val, count = np.unique(x, axis=1, return_counts=True)
尽管我遇到这样的错误:
TypeError: The axis argument to unique is not supported for dtype object
我该如何解决这个问题?
样品x:
array([[' Private', ' HS-grad', ' Divorced'],
[' Private', ' 11th', ' Married-civ-spouse'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Masters', ' Married-civ-spouse'],
[' Private', ' 9th', ' Married-spouse-absent'],
[' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
[' Private', ' Masters', ' Never-married'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)
需要以下各项:
for x_T in x.T:
val, count = np.unique(x_T, return_counts=True)
print (val,count)
[' Private' ' Self-emp-not-inc'] [8 1]
[' 11th' ' 9th' ' Bachelors' ' HS-grad' ' Masters' ' Some-college'] [1 1 2 2 2 1]
[' Divorced' ' Married-civ-spouse' ' Married-spouse-absent'
' Never-married'] [1 6 1 1]
答案 0 :(得分:1)
即使您的输出看起来不像您提供的期望计数,您仍然可以使用Itemfreq:
import numpy as np
from scipy.stats import itemfreq
x = np. array([[' Private', ' HS-grad', ' Divorced'],
[' Private', ' 11th', ' Married-civ-spouse'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Masters', ' Married-civ-spouse'],
[' Private', ' 9th', ' Married-spouse-absent'],
[' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
[' Private', ' Masters', ' Never-married'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)
itemfreq(x)
输出:
array([[' 11th', 1],
[' 9th', 1],
[' Bachelors', 2],
[' Divorced', 1],
[' HS-grad', 2],
[' Married-civ-spouse', 6],
[' Married-spouse-absent', 1],
[' Masters', 2],
[' Never-married', 1],
[' Private', 8],
[' Self-emp-not-inc', 1],
[' Some-college', 1]], dtype=object)
否则,您可以尝试指定另一个dtype,例如:
val, count = np.unique(x.astype("<U22"), axis=1, return_counts=True)
为此,但是您的数组必须不同