我有一个熊猫数据框,看起来像这样:
df
date name product items
0 2020-01-01 google one 224.0
2 2020-01-01 amazon two 4.0
3 2020-01-01 amazon three 8.0
1 2020-01-01 amazon four 4.0
0 2020-01-01 amazon one 17.0
.. ... ... ... ...
351 2020-03-27 google five 9.0
352 2020-03-27 google six 8.0
353 2020-03-27 google one 117.0
426 2020-03-27 amazon three 18.0
427 2020-03-27 amazon four 1.0
我正在尝试对具有两个条件的新列price
应用简单乘法,例如:
items
* 100
items
* 150
items
* 50
以此类推。
如何在两个条件下应用乘法?我假设在应用计算时仅过滤df
就可以了,对吧?
我尝试过的事情:
df['price'[df['name']=='google' & df['product']] == 'one']= df['items'].apply(lambda x:x*100)
但是我得到一个错误:
TypeError:无法对类型为[bool]的标量类型[object]数组和标量执行'rand _'
在满足两个条件的情况下如何实现应用计算?
答案 0 :(得分:2)
对于一种情况,可以选择两侧和多个面罩来更改您的解决方案:
m = (df['name']=='google') & (df['product'] == 'one')
df.loc[m, 'price'] = df.loc[m, 'items'] * 100
print (df)
date name product items price
0 2020-01-01 google one 224.0 22400.0
2 2020-01-01 amazon two 4.0 NaN
3 2020-01-01 amazon three 8.0 NaN
1 2020-01-01 amazon four 4.0 NaN
0 2020-01-01 amazon one 17.0 NaN
351 2020-03-27 google five 9.0 NaN
352 2020-03-27 google six 8.0 NaN
353 2020-03-27 google one 117.0 11700.0
426 2020-03-27 amazon three 18.0 NaN
427 2020-03-27 amazon four 1.0 NaN
如果条件很少,请使用numpy.select
:
m1 = (df['name']=='google') & (df['product'] == 'one')
m2 = (df['name']=='google') & (df['product'] == 'two')
df['price'] = np.select([m1, m2], [100, 150], default=np.nan) * df['items']
print (df)
date name product items price
0 2020-01-01 google one 224.0 22400.0
2 2020-01-01 amazon two 4.0 NaN
3 2020-01-01 amazon three 8.0 NaN
1 2020-01-01 amazon four 4.0 NaN
0 2020-01-01 amazon one 17.0 NaN
351 2020-03-27 google five 9.0 NaN
352 2020-03-27 google six 8.0 NaN
353 2020-03-27 google one 117.0 11700.0
426 2020-03-27 amazon three 18.0 NaN
427 2020-03-27 amazon four 1.0 NaN
如果可能存在许多编码,请使用左连接创建新的DataFrame
和DataFrame.merge
:
df1 = pd.DataFrame({ 'name':['google','google','amazon'],
'product':['one','two','one'],
'mult':[100, 150, 50]})
df1 = df.merge(df1, on=['name','product'], how='left')
df1['price'] = df1['mult'] * df1['items']
print (df1)
date name product items mult price
0 2020-01-01 google one 224.0 100.0 22400.0
1 2020-01-01 amazon two 4.0 NaN NaN
2 2020-01-01 amazon three 8.0 NaN NaN
3 2020-01-01 amazon four 4.0 NaN NaN
4 2020-01-01 amazon one 17.0 50.0 850.0
5 2020-03-27 google five 9.0 NaN NaN
6 2020-03-27 google six 8.0 NaN NaN
7 2020-03-27 google one 117.0 100.0 11700.0
8 2020-03-27 amazon three 18.0 NaN NaN
9 2020-03-27 amazon four 1.0 NaN NaN