在一项研究中,我有两个变量:
x = number objects remembered
y = % tasks completed correctly
如下:
x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
我想返回数字的结果:
WMC Percent Count
2 100 3
3 33 2
3 66 2 etc.
我注意到scipy.stats.itemfreq
和np.bincounts
仅适用于一个变量。
答案 0 :(得分:1)
如果您可以访问最新版本的numpy(1.9.0或更高版本),则可以使用unique并启用return_counts
标记。这将为您提供2个数组,一个包含值,另一个包含计数。
以下是numpy.unique
方法的略微修改版本,适用于您的情况:
def unique(ar):
ar = ar[np.lexsort((ar[:, 1], ar[:, 0]))]
flag = np.concatenate(([True], (ar[1:] != ar[:-1]).any(axis=1)))
idx = np.concatenate(np.nonzero(flag) + ([ar.size / 2],))
return np.array(zip(ar[flag][:, 0], ar[flag][:, 1], np.diff(idx)))
print unique(np.array(zip(x, y)))
结果:
[[ 2. 1. 3. ]
[ 3. 0.33 2. ]
[ 3. 0.66 2. ]
[ 3. 1. 1. ]
[ 4. 0.5 1. ]
[ 4. 0.75 2. ]
[ 4. 1. 3. ]
[ 5. 0.4 1. ]
[ 5. 0.5 1. ]
[ 5. 0.6 1. ]
[ 5. 1. 2. ]
[ 6. 0.6 1. ]
[ 6. 0.75 1. ]
[ 6. 1. 2. ]
[ 7. 0.5 1. ]
[ 7. 0.75 1. ]]
答案 1 :(得分:0)
在您的代码的早期,为什么不构建一个字典链接记住的数字对象' “'%”任务是否正确完成'?
即
completed_tasks = {2 : 1.0, 3 : 33, 4 : 66}
然后,您可以轻松地将已完成的任务计数添加到scipy.stats.itemfreq
返回的数组中:
a = scipy.stats.itemfreq(x)
a = [i.append(completed_tasks[i[0]]) for i in a]
答案 2 :(得分:0)
我会将collections.Counter
用于此目的:
>>> import numpy as np
>>> x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
>>> y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
>>> from collections import Counter
>>> c = Counter(zip(x,y))
>>> c
Counter({(2, 1.0): 3, (4, 1.0): 3, (3, 0.66000000000000003): 2, (5, 1.0): 2, (3, 0.33000000000000002): 2, (6, 1.0): 2, (4, 0.75): 2, (7, 0.5): 1, (6, 0.59999999999999998): 1, (5, 0.40000000000000002): 1, (5, 0.59999999999999998): 1, (3, 1.0): 1, (7, 0.75): 1, (6, 0.75): 1, (5, 0.5): 1, (4, 0.5): 1})
答案 3 :(得分:0)
不确定它是否适用于您的情况,但是,您可以使用压缩列表上的itertools.groupby()
执行此操作:
import numpy as np
from itertools import groupby
x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
print "WMC\tPercent\tCount"
for key, group in groupby(sorted(zip(x, y))):
print "{}\t{}\t{}".format(key[0], int(key[1]*100), len(list(group)))
<强>输出强>
WMC Percent Count
2 100 3
3 33 2
3 66 2
3 100 1
4 100 3
4 75 2
4 50 1
5 100 2
5 60 1
5 40 1
5 50 1
6 75 1
6 100 2
6 60 1
7 50 1
7 75 1
更新以生成numpy数组
import numpy as np
from itertools import groupby
x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
results = np.array([(key[0], int(key[1]*100), len(list(group)))
for key, group in groupby(sorted(zip(x, y)))])
<强>输出强>
>>> results
array([[ 2, 100, 3],
[ 3, 33, 2],
[ 3, 66, 2],
[ 3, 100, 1],
[ 4, 50, 1],
[ 4, 75, 2],
[ 4, 100, 3],
[ 5, 40, 1],
[ 5, 50, 1],
[ 5, 60, 1],
[ 5, 100, 2],
[ 6, 60, 1],
[ 6, 75, 1],
[ 6, 100, 2],
[ 7, 50, 1],
[ 7, 75, 1]])