我有一个数据框,正在使用pandas hist()方法的column
和by
来查看数据子集的直方图,如下所示:
ax = df.hist(column='activity_count', by='activity_month')
(然后我继续绘制此信息)。我正在尝试确定如何以编程方式提取两段数据:当我在轴上循环时,具有特定值'activity_month'以及'activity_month'的记录数:
for i,x in enumerate(ax):`
print("the value of a is", a)
print("the number of rows with value of a", b)
这样我就会得到:
January 1002
February 4305
etc
现在,我可以轻松获得“ activity_month”唯一值的列表,以及有多少行具有给定的activity_month等于该值的计数,
a="January"
len(df[df["activity_month"]=a])
但我想在循环中针对i,x的特定迭代执行此操作。如何在每次迭代中获取“ x”内子集数据的句柄,以便查看迭代中“ activity_month”的值以及具有该值的行数?
答案 0 :(得分:0)
这是一个简短的示例数据框:
import pandas as pd
df = pd.DataFrame([['January',19],['March',6],['January',24],['November',83],['February',23],
['November',4],['February',98],['January',44],['October',47],['January',4],
['April',8],['March',21],['April',41],['June',34],['March',63]],
columns=['activity_month','activity_count'])
收益:
activity_month activity_count
0 January 19
1 March 6
2 January 24
3 November 83
4 February 23
5 November 4
6 February 98
7 January 44
8 October 47
9 January 4
10 April 8
11 March 21
12 April 41
13 June 34
14 March 63
如果您想要df.groupby('activity_month')
中每个组的值之和,则可以这样做:
df.groupby('activity_month')['activity_count'].sum()
礼物:
activity_month
April 49
February 121
January 91
June 34
March 90
November 87
October 47
Name: activity_count, dtype: int64
要获取与给定组相对应的行数:
df.groupby('activity_month')['activity_count'].agg('count')
礼物:
activity_month
April 2
February 2
January 4
June 1
March 3
November 2
October 1
Name: activity_count, dtype: int64
在重新阅读您的问题之后,我确信您没有以最有效的方式解决这个问题。我强烈建议您不要显式循环使用df.hist()
创建的轴,尤其是当这些信息可以从df
本身快速(直接)访问时。