我有一个数据框,每个商店都有肉类,蔬菜和面包的销售。我想将值转换为%,例如,Store N的值将变为74%,7%和19%。换句话说,就商店N的总销售额而言,74%是肉类销售的百分比。最简单的方法是什么?
import pandas as pd
df=pd.DataFrame({'Store':['N','S','E','W']
,'Meat':[200,250,100,400]
,'Veg':[20,100,30,80]
,'Bread':[50,230,150,100]})
df=df[['Store','Meat','Veg','Bread']]
答案 0 :(得分:4)
不使用循环的纯大熊猫解决方案是:
df.ix[:, 1:] = (df.ix[:, 1:].T / df.ix[:, 1:].sum(1)).T
print(df)
结果:
Store Meat Veg Bread
0 N 0.740741 0.074074 0.185185
1 S 0.431034 0.172414 0.396552
2 E 0.357143 0.107143 0.535714
3 W 0.689655 0.137931 0.172414
答案 1 :(得分:3)
您可以先使用Store
列set_index
,然后除以div
的sum
和reset_index
的最后一次:
df.set_index('Store', inplace=True)
df = df.div(df.sum(1), axis=0)
print (df.reset_index())
Store Meat Veg Bread
0 N 0.740741 0.074074 0.185185
1 S 0.431034 0.172414 0.396552
2 E 0.357143 0.107143 0.535714
3 W 0.689655 0.137931 0.172414
df.ix[:,'Meat':] = df.ix[:,'Meat':].div(df.ix[:,'Meat':].sum(1), axis=0)
print (df)
Store Meat Veg Bread
0 N 0.740741 0.074074 0.185185
1 S 0.431034 0.172414 0.396552
2 E 0.357143 0.107143 0.535714
3 W 0.689655 0.137931 0.172414
df.iloc[:,1:] = df.iloc[:,1:].div(df.iloc[:,1:].sum(1), axis=0)
print (df)
Store Meat Veg Bread
0 N 0.740741 0.074074 0.185185
1 S 0.431034 0.172414 0.396552
2 E 0.357143 0.107143 0.535714
3 W 0.689655 0.137931 0.172414
<强>计时:
In [187]: %timeit (jez1(df))
100 loops, best of 3: 4.07 ms per loop
In [188]: %timeit (jez2(df1))
100 loops, best of 3: 5.61 ms per loop
In [189]: %timeit (jez3(df2))
100 loops, best of 3: 5.44 ms per loop
In [190]: %timeit (ric(df3))
100 loops, best of 3: 6.18 ms per loop
In [191]: %timeit (ogi(df4))
1 loop, best of 3: 2.2 s per loop
计时代码 s:
#random dataframe
np.random.seed(100)
#10 data columns + first Store col, 10k rows
df = pd.DataFrame(np.random.randint(10, size=(10000,10)), columns=list('ABCDEFGHIJ'))
df.index = 'a' + df.index.astype(str)
df = df.reset_index().rename(columns={'index':'Store'})
print (df)
df1, df2, df3, df4 = df.copy(), df.copy(), df.copy(), df.copy()
def jez1(df):
df = df.set_index('Store')
df = 100 * df.div(df.sum(1), axis=0)
return (df.reset_index())
def jez2(df):
df.ix[:,'A':] = df.ix[:,'A':].div(df.ix[:,'A':].sum(1), axis=0)
return df
def jez3(df):
df.iloc[:,1:] = df.iloc[:,1:].div(df.iloc[:,1:].sum(1), axis=0)
return df
def ric(df):
df.ix[:, 1:] = (df.ix[:, 1:].T / df.ix[:, 1:].sum(1)).T
return df
def ogi(df):
df.ix[:, 1:]=df.ix[:,1:].apply(lambda x: x/x.sum(), axis=1)
return df
print (jez1(df))
print (jez2(df1))
print (jez3(df2))
print (ric(df3))
print (ogi(df4))
答案 2 :(得分:2)
您还可以将pandas.apply与lambda函数一起使用:
df.ix[:, 1:]=df.ix[:,1:].apply(lambda x: x*100/x.sum(), axis=1)
这给了你:
Store Meat Veg Bread
0 N 74.074074 7.407407 18.518519
1 S 43.103448 17.241379 39.655172
2 E 35.714286 10.714286 53.571429
3 W 68.965517 13.793103 17.241379
答案 3 :(得分:1)
您可以手动计算百分比:
df['MeatPerc'] = df['Meat']/df['Meat'].sum()