我有一个数据集
Item Type market_share
Office Supplies 10
Baby Food 20
Vegetables 10
Meat 30
Personal Care 10
Household 20
我想汇总除“婴儿食品”列之外的所有行,以便我的数据集看起来像
Item Type market_share
Others 80
Baby Food 20
我该怎么做,基本上将所有行合并在一起,将它们累加并放入其他行中。
答案 0 :(得分:5)
您可以使用:
df.groupby(df['Item Type'].eq('Baby Food').map({True:'Baby Food',False:'Others'})).sum()
market_share
Item Type
Baby Food 20
Others 80
答案 1 :(得分:2)
根据条件或Series.map
创建array
或Series
,并将缺失的值转换为NaN
并汇总sum
:
s = np.where(df['Item Type'] == 'Baby Food', 'Baby Food', 'Others')
print (s)
['Others' 'Baby Food' 'Others' 'Others' 'Others' 'Others']
s = df['Item Type'].map({'Baby Food':'Baby Food'}).fillna('Others')
print (s)
0 Others
1 Baby Food
2 Others
3 Others
4 Others
5 Others
Name: Item Type, dtype: object
df = df.groupby(s)['market_share'].sum().rename_axis('Item Type').reset_index()
print (df)
Item Type market_share
0 Baby Food 20
1 Others 80
答案 2 :(得分:0)
使用np.where-
df['market_share_2'] = np.where(df['Item Type'].values=='Baby Food', 'Baby Food', 'Others')
输出
Item Type market_share market_share_2
0 Office Supplies 10 Others
1 Baby Food 20 Baby Food
2 Vegetables 10 Others
3 Meat 30 Others
4 Personal_Care 10 Others
5 Household 20 Others
然后使用value_counts()
-
df['market_share_2'].value_counts()
Others 5
Baby Food 1
Name: market_share_2, dtype: int64
TLDR;
pd.Series(np.where(df['Item Type'].values=='Baby Food', 'Baby Food', 'Others')).value_counts()
答案 3 :(得分:0)
您可以使用除外函数!=
和is函数==
。
df[df['market_share'] != 'Baby Food'].sum()
df[df['market_share'] == 'Baby Food'].sum()