我有一个数据集,显示每个类别的负载计数。以下是我的数据。
Name,Count1,Count2,PercentDiff,Category
Store A,10,4,0.4,Less than 1%
Store B,20,26,1.3,Less than 5%
Store C,12,48,4,Less than 5%
Store D,30,180,6,Less than 10%
我想获得以下每个类别的计数
1. Less than 0
2. Less than 1%
3. Less than 5%
4. Less than 10%
5. More than 10%
我使用以下规则对每个条目进行分类:
new.loc[new['PercentDiff'] < 0, 'Category'] = 'Less than 0%'
new.loc[new['PercentDiff'] == 0, 'Category'] = 'Exact match'
new.loc[new['PercentDiff'] < 0.01, 'Category'] = 'Less than 1%'
new.loc[new['PercentDiff'] < 0.05, 'Category'] = 'Less than 5%'
new.loc[new['PercentDiff'] < 0.1, 'Category'] = 'Less than 10%'
new.loc[new['PercentDiff'] == 0, 'Category'] = 'Exact match'
new.loc[new['PercentDiff'] > 0.1, 'Category'] = 'Greater than 10%'
new['PercentDiff1'] = new['PercentDiff'].astype(int)
Output1 = new.groupby(['Category']).agg(lambda x: x.mad())
Output1 = Output1.replace(np.nan, '', regex=True)
SumMail = pd.value_counts(Output1['Category'].values)
但是,如果数据集没有任何类别的值,则会收到错误消息,指出没有找到特定类别的值。
TypeError:&#39; str&#39; object不能解释为整数
KeyError:&#39;超过10%&#39;
任何人都可以帮我修改这段代码,使其对没有记录的类别返回0。
提前致谢。
答案 0 :(得分:0)
您需要定义'类别'列astype分类dtype:
df['Category'] = df['Category'].astype('category')
df['Category'] = df['Category'].cat.set_categories(['Less than 0',
'Less than 1%',
'Less than 5%',
'Less than 10%',
'More than 10%'],
ordered=True)
df['Category'].value_counts(sort=False)
输出:
Less than 0 0
Less than 1% 1
Less than 5% 2
Less than 10% 1
More than 10% 0
Name: Category, dtype: int64
答案 1 :(得分:0)
在进行标记之前,请检查您的数据框是否为空。
if new['PercentDiff'].empty:
return 0
else:
new.loc[new['PercentDiff'] < 0, 'Category'] = 'Less than 0%'
new.loc[new['PercentDiff'] == 0, 'Category'] = 'Exact match'
new.loc[new['PercentDiff'] < 0.01, 'Category'] = 'Less than 1%'
new.loc[new['PercentDiff'] < 0.05, 'Category'] = 'Less than 5%'
new.loc[new['PercentDiff'] < 0.1, 'Category'] = 'Less than 10%'
new.loc[new['PercentDiff'] == 0, 'Category'] = 'Exact match'
new.loc[new['PercentDiff'] > 0.1, 'Category'] = 'Greater than 10%'
new['PercentDiff1'] = new['PercentDiff'].astype(int)
Output1 = new.groupby(['Category']).agg(lambda x: x.mad())
Output1 = Output1.replace(np.nan, '', regex=True)
SumMail = pd.value_counts(Output1['Category'].values)