Question

说我有一个清单：

a = [3, 5, 1, 1, 3, 2, 4, 1, 6, 4, 8]

和a的子列表：

b = [5, 2, 6, 8]

我希望通过pd.qcut(a,2)获取分档，并计算列表b的每个分档中的值数。那是

In[84]: pd.qcut(a,2)
Out[84]: 
Categorical: 
[[1, 3], (3, 8], [1, 3], [1, 3], [1, 3], [1, 3], (3, 8], [1, 3], (3, 8], (3, 8], (3, 8]]
Levels (2): Index(['[1, 3]', '(3, 8]'], dtype=object)

现在我知道箱子是：[1,3]和（3,8），我想知道列表“b”的每个箱子中有多少值。我可以手动执行此操作箱子很小，但是当箱子的数量很大时，最好的方法是什么？

Answer 1

你可以使用retbins参数来从qcut获取bin：

>>> q, bins = pd.qcut(a, 2, retbins=True)

然后使用pd.cut获取关于垃圾箱的b索引：

>>> b = np.array(b)
>>> hist = pd.cut(b, bins, right=True).labels
>>> hist[b==bins[0]] = 0
>>> hist
array([1, 0, 1, 1])

请注意，您必须单独处理角落案例bins[0]，因为它不包含在最左侧的箱子中。

Answer 2

如前面的答案所示：您可以使用qcut参数从retbins获取bin边界，如下所示：

q, bins = pd.qcut(a, 2, retbins=True)

然后，您可以使用cut将其他列表中的值放入这些“bins”中。例如：

myList = np.random.random(100)
# Define bin bounds that cover the range returned by random()
bins = [0, .1, .9, 1] 
# Now we can get the "bin number" of each value in myList:
binNum = pd.cut(myList, bins, labels=False, include_lowest=True)
# And then we can count the number of values in each bin number:
np.bincount(binNum)

确保您的bin边界覆盖了第二个列表中显示的整个值范围。为确保这一点，您可以使用max和min值填充bin边界。如，

cutBins = [float('-inf')] + bins.tolist() + [float('inf')]

大熊猫根据另一个列表的qcut列出一个列表

2 个答案: