有效地计算NumPy中唯一子阵列的出现次数?

时间:2015-06-16 22:37:16

标签: python arrays numpy counting

我有一个形状tavgk > 0的数组,我想在最后一个维度中找到长度为8的唯一子数组的出现次数。

我知道(128, 36, 8)np.unique,但这些似乎是元素而不是子数组。我见过this question,但它是关于找到特定子阵列的第一次出现,而不是所有唯一子阵列的计数。

3 个答案:

答案 0 :(得分:3)

问题表明输入数组的形状为(128, 36, 8),我们有兴趣在最后一个维度中找到长度为8的唯一子数组。 所以,我假设唯一性是将前两个维度合并在一起。我们假设A为输入3D数组。

获取唯一子阵列的数量

# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])

# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar =  Ar[sorted_idx,:]

# Get the count of rows that have at least one TRUE value 
# indicating presence of unique subarray there
unq_out = np.any(np.diff(sorted_Ar,axis=0),1).sum()+1

示例运行 -

In [159]: A # A is (2,2,3)
Out[159]: 
array([[[0, 0, 0],
        [0, 0, 2]],

       [[0, 0, 2],
        [2, 0, 1]]])

In [160]: unq_out
Out[160]: 3

获取唯一子阵列的出现次数

# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])

# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar =  Ar[sorted_idx,:]

# Get IDs for each element based on their uniqueness
id = np.append([0],np.any(np.diff(sorted_Ar,axis=0),1).cumsum())

# Get counts for each ID as the final output
unq_count = np.bincount(id) 

示例运行 -

In [64]: A
Out[64]: 
array([[[0, 0, 2],
        [1, 1, 1]],

       [[1, 1, 1],
        [1, 2, 0]]])

In [65]: unq_count
Out[65]: array([1, 2, 1], dtype=int64)

答案 1 :(得分:1)

在这里,我修改了@Divakar非常有用的答案,以返回唯一子数组的计数以及子数组本身,以便输出与void Main() { var csvlines = File.ReadAllLines(@"c:\smdr.csv"); var csvLinesData = csvlines.Skip(1).Select(l => l.Split(',').ToArray()); var csvData = csvLinesData.Where(l => (l[6] != "VM Channel" && l[6] != "Voice Mail")).ToArray(); var user = (from r in csvData group r by new { prop1 = r[12], Time = ((DateTime)r[0]).TimeOfDay } into g orderby g.Count() select new User { CSRName=g.Key, Incomming=(from r1 in g where r1[3]=="I" select r1).Count(), outgoing = (from r1 in g where r1[3] == "O" select r1).Count() }).ToList(); } class User { public string CSRName; public int outgoing; public int Incomming; } 的输出相同:

collections.Counter.most_common()

答案 2 :(得分:0)

我不确定这是最有效的方法,但这应该有效。

arr = arr.reshape(128*36,8)
unique_ = []
occurence_ = []

for sub in arr:
    if sub.tolist() not in unique_:
        unique_.append(sub.tolist())
        occurence_.append(1)
    else:
        occurence_[unique_.index(sub.tolist())]+=1
for index_,u in unique_:
   print u,"occurrence: %s"%occurence_[index_]