然后使用pd.cut和pd.vales_count作为2d数组

时间:2018-11-03 22:28:02

标签: python pandas list

用例

  1. 我从人群中得到随机观察结果。
  2. 然后我使用pd.cut将它们按bin分组
  3. 然后我使用pd.values_counts提取值
  4. 我想获取计算出的间隔标签和频率计数
  5. 我想将label列“粘合”到频率计数列以获得2d数组(包含2列和n个间隔行)
  6. 我想将2d数组转换为COM互操作的列表。

我接近期望的输出,但是我是Python新手,所以一些聪明的人可以优化我的标签代码。

这里的问题是最终输出的约束,它必须是一个列表,以便可以通过COM互操作层编组到Excel VBA。

import inspect
import numpy as np
import pandas as pd
from scipy.stats import skewnorm

pop = skewnorm.rvs(0, size=20)
bins=[-5,-4,-3,-2,-1,0,1,2,3,4,5]
bins2 = np.array(bins)
bins3 = pd.cut(pop,bins2)
bins4 = [0]*(bins2.size-1)

#print my own labels, doh!
idx=0
for binLoop in bins3.categories:
    intervalAsString="(" + str(binLoop.left)+ "," + str(binLoop.right)+"]" 
    print (intervalAsString)
    bins4[idx]=intervalAsString
    idx=idx+1


table = pd.value_counts(bins3, sort=False)

joined = np.vstack((bins4,table.tolist()))

print (joined)

目标输出可转换为列表的2d数组

|  (-5, -4]  |  0  |
|  (-4, -3]  |  0  |
|  (-3, -2]  |  0  |
|  (-2, -1]  |  1  |
|  (-1, 0]   |  3  |
|  (0, 1]    |  9  |
|  (1, 2]    |  4  |
|  (2, 3]    |  2  |
|  (3, 4]    |  1  |
|  (4, 5]    |  0  |

1 个答案:

答案 0 :(得分:1)

如果我对您的理解正确,以下应该做的是您要做的:

pop = skewnorm.rvs(0, size=20)
bins = range(-5, 5)
binned = pd.cut(pop, bins)

# create the histogram data
hist = binned.value_counts()

# hist is a pandas series with a categorical index describing the bins
# `index.astype(str)` will convert the categories to strings.
hist.index = hist.index.astype(str)

# `.reset_index()` will turn the index into an ordinary column
# `.values` gives you the underlying numpy array
# `tolist()` converts the numpy array to a native python list o' lists.
print(hist.reset_index().values.tolist())