我有分数数据,我想按自定义分数进行分组。
Id Date Score
1 2018-01-01 56
2 2018-01-01 72
3 2018-01-01 91
4 2018-01-01 67
5 2018-01-01 65
6 2018-01-02 75
7 2018-01-02 72
8 2018-01-02 91
9 2018-01-02 82
10 2018-01-02 81
这是我的预期
Date <60 60-79 80-100
2018-01-01 1 3 1
2018-01-02 0 2 3
我做的是
result = pd.crosstab(df.Date, df.Score)
result['<60'] = result['1']+ ... + result['59']
result['60-79'] = result['60']+ ... + result['79']
result['80-100'] = result['80']+ ... + result['100']
然后我应该删除很多列
有更好的方法吗?
答案 0 :(得分:6)
bins = [-np.inf, 60, 79, 100]
labels = ['<60','60-79','80-100']
df = (df.groupby([df['Date'], pd.cut(df['Score'], bins=bins, labels=labels)])
.size()
.unstack(fill_value=0))
print (df)
Score <60 60-79 80-100
Date
2018-01-01 1 3 1
2018-01-02 0 2 3
答案 1 :(得分:4)
In [59]: pd.crosstab(
df.Date,
pd.cut(df.Score, bins=[0, 60, 79, 100], labels='<60 60-79 80-100'.split()))
Out[59]:
Score <60 60-79 80-100
Date
2018-01-01 1 3 1
2018-01-02 0 2 3
答案 2 :(得分:3)
一种方法是使用pandas.cut
后跟pandas.pivot_table
:
df['Bin'] = pd.cut(df['Score'], [0, 60, 79, 100])
res = df.drop('Id', 1)\
.pivot_table(index='Date', columns='Bin', aggfunc='count')\
.fillna(0).astype(int)
print(res)
# Score
# Bin (0, 60] (60, 79] (79, 100]
# Date
# 2018-01-01 1 3 1
# 2018-01-02 0 2 3
答案 3 :(得分:3)
您可以使用value_counts
并传递您需要的bins
df.groupby('Date').Score.apply(pd.Series.value_counts,bins=[-np.inf, 60, 79, 100]).unstack()
Out[442]:
(-inf, 60.0] (60.0, 79.0] (79.0, 100.0]
Date
2018-01-01 1 3 1
2018-01-02 0 2 3