我正在寻找一个获得一维排序数组并返回的函数 具有两列的二维数组,第一列包含非重复的 项目和第二列包含项目的重复次数。马上 我的代码如下:
def priorsGrouper(priors):
if priors.size==0:
ret=priors;
elif priors.size==1:
ret=priors[0],1;
else:
ret=numpy.zeros((1,2));
pointer1,pointer2=0,0;
while(pointer1<priors.size):
counter=0;
while(pointer2<priors.size and priors[pointer2]==priors[pointer1]):
counter+=1;
pointer2+=1;
ret=numpy.row_stack((ret,[priors[pointer1],pointer2-pointer1]))
pointer1=pointer2;
return ret;
print priorsGrouper(numpy.array([1,2,2,3]))
我的输出如下:
[[ 0. 0.]
[ 1. 1.]
[ 2. 2.]
[ 3. 1.]]
首先,我无法摆脱我的[0,0]。其次我想知道是否有 一个numpy或scipy函数,或者我可以吗?
感谢。
答案 0 :(得分:4)
您可以使用np.unique获取x
中的唯一值,以及索引数组(称为inverse
)。 inverse
可以被视为x
中元素的“标签”。与x
本身不同,标签始终为整数,从0开始。
然后你可以拿一个bincount标签。由于标签从0开始,因此bincount不会填充很多您不关心的零。
最后,column_stack会将y
和bincount加入2D数组:
In [84]: x = np.array([1,2,2,3])
In [85]: y, inverse = np.unique(x, return_inverse=True)
In [86]: y
Out[86]: array([1, 2, 3])
In [87]: inverse
Out[87]: array([0, 1, 1, 2])
In [88]: np.bincount(inverse)
Out[88]: array([1, 2, 1])
In [89]: np.column_stack((y,np.bincount(inverse)))
Out[89]:
array([[1, 1],
[2, 2],
[3, 1]])
有时当数组较小时,使用普通Python方法比NumPy函数更快。我想检查一下这是否是这种情况,如果是这样,那么在NumPy方法更快之前,x
有多大。
以下是作为x
大小函数的各种方法的性能图:
In [173]: x = np.random.random(1000)
In [174]: x.sort()
In [156]: %timeit using_unique(x)
10000 loops, best of 3: 99.7 us per loop
In [180]: %timeit using_groupby(x)
100 loops, best of 3: 3.64 ms per loop
In [157]: %timeit using_counter(x)
100 loops, best of 3: 4.31 ms per loop
In [158]: %timeit using_ordered_dict(x)
100 loops, best of 3: 4.7 ms per loop
对于1000的len(x)
,using_unique
比任何测试的普通Python方法快35倍。
所以看起来using_unique
速度最快,即使是非常小的len(x)
。
以下是用于生成图表的程序:
import numpy as np
import collections
import itertools as IT
import matplotlib.pyplot as plt
import timeit
def using_unique(x):
y, inverse = np.unique(x, return_inverse=True)
return np.column_stack((y, np.bincount(inverse)))
def using_counter(x):
result = collections.Counter(x)
return np.array(sorted(result.items()))
def using_ordered_dict(x):
result = collections.OrderedDict()
for item in x:
result[item] = result.get(item,0)+1
return np.array(result.items())
def using_groupby(x):
return np.array([(k, sum(1 for i in g)) for k, g in IT.groupby(x)])
fig, ax = plt.subplots()
timing = collections.defaultdict(list)
Ns = [int(round(n)) for n in np.logspace(0, 3, 10)]
for n in Ns:
x = np.random.random(n)
x.sort()
timing['unique'].append(
timeit.timeit('m.using_unique(m.x)', 'import __main__ as m', number=1000))
timing['counter'].append(
timeit.timeit('m.using_counter(m.x)', 'import __main__ as m', number=1000))
timing['ordered_dict'].append(
timeit.timeit('m.using_ordered_dict(m.x)', 'import __main__ as m', number=1000))
timing['groupby'].append(
timeit.timeit('m.using_groupby(m.x)', 'import __main__ as m', number=1000))
ax.plot(Ns, timing['unique'], label='using_unique')
ax.plot(Ns, timing['counter'], label='using_counter')
ax.plot(Ns, timing['ordered_dict'], label='using_ordered_dict')
ax.plot(Ns, timing['groupby'], label='using_groupby')
plt.legend(loc='best')
plt.ylabel('milliseconds')
plt.xlabel('size of x')
plt.show()
答案 1 :(得分:3)
如果订单不重要,请使用计数器。
from collections import Counter
% Counter([1,2,2,3])
= Counter({2: 2, 1: 1, 3: 1})
% Counter([1,2,2,3]).items()
[(1, 1), (2, 2), (3, 1)]
为了保留订单(首次出现),您可以实现自己的Counter版本:
from collections import OrderedDict
def OrderedCounter(seq):
res = OrderedDict()
for x in seq:
res.setdefault(x, 0)
res[x] += 1
return res
% OrderedCounter([1,2,2,3])
= OrderedDict([(1, 1), (2, 2), (3, 1)])
% OrderedCounter([1,2,2,3]).items()
= [(1, 1), (2, 2), (3, 1)]
答案 2 :(得分:1)
如果要计算项目的重复次数,可以使用字典:
l = [1, 2, 2, 3]
d = {}
for i in l:
if i not in d:
d[i] = 1
else:
d[i] += 1
result = [[k, v] for k, v in d.items()]
对于您的示例返回:
[[1, 1],
[2, 2],
[3, 1]]
祝你好运。
答案 3 :(得分:0)
首先,您不需要以分号(;
)结束语句,这不是C.: - )
其次,第5行(和其他人)将ret
设置为value,value
,但这不是列表:
>type foo.py
def foo():
return [1],2
a,b = foo()
print "a = {0}".format(a)
print "b = {0}".format(b)
给出:
>python foo.py
a = [1]
b = 2
第三:有更简单的方法可以做到这一点,这就是我想到的:
这是一种方式:
def priorsGrouper(priors):
"""Find out how many times each element occurs in a list.
@param[in] priors List of elements
@return Two-dimensional list: first row is the unique elements,
second row is the number of occurrences of each element.
"""
# Generate a `list' containing only unique elements from the input
mySet = set(priors)
# Create the list that will store the number of occurrences
occurrenceCounts = []
# Count how many times each element occurs on the input:
for element in mySet:
occurrenceCounts.append(priors.count(element))
# Combine the two:
combinedArray = zip(mySet, occurrenceCounts)
# End of priorsGrouper() ----------------------------------------------
# Check zero-element case
print priorsGrouper([])
# Check multi-element case
sampleInput = ['a','a', 'b', 'c', 'c', 'c']
print priorsGrouper(sampleInput)