如何在为DataFrame系列创建类别时从QCUT中获取整数

时间:2016-10-25 16:43:14

标签: python pandas dataframe

有两个ndarray:

import pandas as pd
import numpy as np

a = np.arange(0,100, 10)
b = np.random.random_integers(low=9000, high=10000, size=(1000,)) 

我继续创建DataFrame:

numbers =  np.concatenate((a, b), axis=0)
df = pd.DataFrame({'a':numbers})

由于大多数数字值(1000个数字)介于9,000到10,000之间,并且只有10个数字介于1到100之间,我使用qcut()方法来获取逻辑上的类别间隔为每个范围内的数字的百分比:

df['cats'] = pd.qcut(df.a, 10)
print pd.value_counts(df['cats'])

打印出来:

[0, 9103]           102
(9630.4, 9717]      102
(9407, 9519]        102
(9307.4, 9407]      102
(9895.3, 10000]     101
(9717, 9810]        101
(9203.6, 9307.4]    101
(9810, 9895.3]      100
(9103, 9203.6]      100
(9519, 9630.4]       99
Name: cats, dtype: int64

而不是“(9103,9203.6)”,“(9519,9630.4)”qcut生成的标签我希望我能得到整数,例如1,2,3,4,5,6, 7,8,9等等?

2 个答案:

答案 0 :(得分:1)

以下是root发布的解决方案:

import pandas as pd
import numpy as np

a = np.arange(0,100, 10)
b = np.arange(9000, 10000)

numbers =  np.concatenate((a, b), axis=0)

df = pd.DataFrame({'a':numbers})


df['cats'] = pd.qcut(df.a, 10, labels=False)

print df['cats'].value_counts()

答案 1 :(得分:1)

使用labels=np.arange(10) + 1

df['cats'] = pd.qcut(df.a, 10, labels=np.arange(10) + 1)
print pd.value_counts(df['cats'])

1     103
3     102
10    101
9     101
8     101
7     101
6     101
4     101
5     100
2      99
Name: cats, dtype: int64