有两个ndarray:
import pandas as pd
import numpy as np
a = np.arange(0,100, 10)
b = np.random.random_integers(low=9000, high=10000, size=(1000,))
我继续创建DataFrame:
numbers = np.concatenate((a, b), axis=0)
df = pd.DataFrame({'a':numbers})
由于大多数数字值(1000个数字)介于9,000到10,000之间,并且只有10个数字介于1到100之间,我使用qcut()
方法来获取逻辑上的类别间隔为每个范围内的数字的百分比:
df['cats'] = pd.qcut(df.a, 10)
print pd.value_counts(df['cats'])
打印出来:
[0, 9103] 102
(9630.4, 9717] 102
(9407, 9519] 102
(9307.4, 9407] 102
(9895.3, 10000] 101
(9717, 9810] 101
(9203.6, 9307.4] 101
(9810, 9895.3] 100
(9103, 9203.6] 100
(9519, 9630.4] 99
Name: cats, dtype: int64
而不是“(9103,9203.6)”,“(9519,9630.4)”qcut
生成的标签我希望我能得到整数,例如1,2,3,4,5,6, 7,8,9等等?
答案 0 :(得分:1)
以下是root发布的解决方案:
import pandas as pd
import numpy as np
a = np.arange(0,100, 10)
b = np.arange(9000, 10000)
numbers = np.concatenate((a, b), axis=0)
df = pd.DataFrame({'a':numbers})
df['cats'] = pd.qcut(df.a, 10, labels=False)
print df['cats'].value_counts()
答案 1 :(得分:1)
使用labels=np.arange(10) + 1
df['cats'] = pd.qcut(df.a, 10, labels=np.arange(10) + 1)
print pd.value_counts(df['cats'])
1 103
3 102
10 101
9 101
8 101
7 101
6 101
4 101
5 100
2 99
Name: cats, dtype: int64