用2个变量计数

时间:2014-12-01 11:08:55

标签: python bins

在一项研究中,我有两个变量:

x = number objects remembered
y = % tasks completed correctly

如下:

x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])

我想返回数字的结果:

WMC Percent Count
2   100      3
3    33      2
3    66      2  etc.

我注意到scipy.stats.itemfreqnp.bincounts仅适用于一个变量。

4 个答案:

答案 0 :(得分:1)

如果您可以访问最新版本的numpy(1.9.0或更高版本),则可以使用unique并启用return_counts标记。这将为您提供2个数组,一个包含值,另一个包含计数。

以下是numpy.unique方法的略微修改版本,适用于您的情况:

def unique(ar):
    ar = ar[np.lexsort((ar[:, 1], ar[:, 0]))]
    flag = np.concatenate(([True], (ar[1:] != ar[:-1]).any(axis=1)))
    idx = np.concatenate(np.nonzero(flag) + ([ar.size / 2],))
    return np.array(zip(ar[flag][:, 0], ar[flag][:, 1], np.diff(idx)))

print unique(np.array(zip(x, y)))

结果:

[[ 2.    1.    3.  ]
 [ 3.    0.33  2.  ]
 [ 3.    0.66  2.  ]
 [ 3.    1.    1.  ]
 [ 4.    0.5   1.  ]
 [ 4.    0.75  2.  ]
 [ 4.    1.    3.  ]
 [ 5.    0.4   1.  ]
 [ 5.    0.5   1.  ]
 [ 5.    0.6   1.  ]
 [ 5.    1.    2.  ]
 [ 6.    0.6   1.  ]
 [ 6.    0.75  1.  ]
 [ 6.    1.    2.  ]
 [ 7.    0.5   1.  ]
 [ 7.    0.75  1.  ]]

答案 1 :(得分:0)

在您的代码的早期,为什么不构建一个字典链接记住的数字对象' “'%”任务是否正确完成'?

completed_tasks = {2 : 1.0, 3 : 33, 4 : 66}

然后,您可以轻松地将已完成的任务计数添加到scipy.stats.itemfreq返回的数组中:

a = scipy.stats.itemfreq(x)
a = [i.append(completed_tasks[i[0]]) for i in a]

答案 2 :(得分:0)

我会将collections.Counter用于此目的:

>>> import numpy as np
>>> x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
>>> y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
>>> from collections import Counter
>>> c = Counter(zip(x,y))
>>> c
Counter({(2, 1.0): 3, (4, 1.0): 3, (3, 0.66000000000000003): 2, (5, 1.0): 2, (3, 0.33000000000000002): 2, (6, 1.0): 2, (4, 0.75): 2, (7, 0.5): 1, (6, 0.59999999999999998): 1, (5, 0.40000000000000002): 1, (5, 0.59999999999999998): 1, (3, 1.0): 1, (7, 0.75): 1, (6, 0.75): 1, (5, 0.5): 1, (4, 0.5): 1})

答案 3 :(得分:0)

不确定它是否适用于您的情况,但是,您可以使用压缩列表上的itertools.groupby()执行此操作:

import numpy as np
from itertools import groupby

x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])

print "WMC\tPercent\tCount"
for key, group in groupby(sorted(zip(x, y))):
    print "{}\t{}\t{}".format(key[0], int(key[1]*100), len(list(group)))

<强>输出

WMC Percent Count
2   100 3
3   33  2
3   66  2
3   100 1
4   100 3
4   75  2
4   50  1
5   100 2
5   60  1
5   40  1
5   50  1
6   75  1
6   100 2
6   60  1
7   50  1
7   75  1

更新以生成numpy数组

import numpy as np
from itertools import groupby

x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])

results = np.array([(key[0], int(key[1]*100), len(list(group)))
                        for key, group in groupby(sorted(zip(x, y)))])

<强>输出

>>> results
array([[  2, 100,   3],
       [  3,  33,   2],
       [  3,  66,   2],
       [  3, 100,   1],
       [  4,  50,   1],
       [  4,  75,   2],
       [  4, 100,   3],
       [  5,  40,   1],
       [  5,  50,   1],
       [  5,  60,   1],
       [  5, 100,   2],
       [  6,  60,   1],
       [  6,  75,   1],
       [  6, 100,   2],
       [  7,  50,   1],
       [  7,  75,   1]])