pandas中group by的值计数

时间:2016-08-27 06:50:30

标签: python pandas matplotlib dataframe

我是大熊猫的新手,并试图了解这个功能。

iris = datasets.load_iris()
grps = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
iris = pd.DataFrame(iris.data[:,0],columns = ["SL"])
iris['bins'] = pd.cut(iris["SL"], 10 ,labels = grps )
iris['target'] = datasets.load_iris().target

我想知道每个箱子中每个班级的出现次数。我怎么做这个,因为我似乎无法想办法。我想将此输出绘制为堆叠直方图

2 个答案:

答案 0 :(得分:3)

试试这个:

In [83]: import seaborn as sns

In [84]: x = iris.groupby(['target','bins']).size().to_frame('occurences').reset_index()

In [85]: x
Out[85]:
    target bins  occurences
0        0    1           9
1        0    2          19
2        0    3          12
3        0    4           9
4        0    5           1
5        1    2           3
6        1    3           2
7        1    4          16
8        1    5          13
9        1    6           7
10       1    7           7
11       1    8           2
12       2    2           1
13       2    4           2
14       2    5           8
15       2    6          13
16       2    7          11
17       2    8           4
18       2    9           5
19       2   10           6

In [86]: sns.barplot(x='bins', y='occurences', hue='target', data=x)
Out[86]: <matplotlib.axes._subplots.AxesSubplot at 0x9b3bb38>

enter image description here

答案 1 :(得分:1)

您可以将value_counts分类到与cut中指定的组对应的容器中后使用Image

sns.set_style('darkgrid')

df = pd.DataFrame(datasets.load_iris().data[:,0], columns=['SL'])
df['target'] = datasets.load_iris().target

# Total number of bins to be grouped under
grps = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Empty list to append later
grouped_list = []

# Iterating through grouped by target variable
for label, key in df.groupby('target'):
    grouped_list.append(pd.cut(key['SL'], bins = grps).value_counts())

# Concatenate column-wise and create a stacked-bar plot
pd.concat(grouped_list, axis=1).add_prefix('class_').plot(kind='bar', stacked=True, rot=0, 
                                                        figsize=(6,6), cmap=plt.cm.rainbow)

# Aesthetics
plt.title("Sepal Length binned counts")
plt.xlabel("Buckets")
plt.ylabel("Occurences")
sns.plt.show()

{{3}}