将计算应用于满足条件的数据框

时间:2020-03-27 13:17:47

标签: python pandas dataframe

我有一个熊猫数据框,看起来像这样:

df

     date           name        product                 items
0    2020-01-01     google          one                 224.0
2    2020-01-01     amazon          two                   4.0
3    2020-01-01     amazon        three                   8.0
1    2020-01-01     amazon         four                   4.0
0    2020-01-01     amazon          one                  17.0
..          ...     ...             ...                   ...
351  2020-03-27     google         five                   9.0
352  2020-03-27     google          six                   8.0
353  2020-03-27     google          one                 117.0
426  2020-03-27     amazon        three                  18.0
427  2020-03-27     amazon         four                   1.0

我正在尝试对具有两个条件的新列price应用简单乘法,例如:

  • 如果名称是google,产品是1,则items * 100
  • 如果名称是google,产品是两个,则items * 150
  • 如果名称为Amazon,产品为1,则items * 50

以此类推。

如何在两个条件下应用乘法?我假设在应用计算时仅过滤df就可以了,对吧?

我尝试过的事情:

df['price'[df['name']=='google' & df['product']] == 'one']= df['items'].apply(lambda x:x*100)

但是我得到一个错误:

TypeError:无法对类型为[bool]的标量类型[object]数组和标量执行'rand _'

在满足两个条件的情况下如何实现应用计算?

1 个答案:

答案 0 :(得分:2)

对于一种情况,可以选择两侧和多个面罩来更改您的解决方案:

m = (df['name']=='google') & (df['product'] == 'one')
df.loc[m, 'price'] = df.loc[m, 'items'] * 100
print (df)
           date    name product  items    price
0    2020-01-01  google     one  224.0  22400.0
2    2020-01-01  amazon     two    4.0      NaN
3    2020-01-01  amazon   three    8.0      NaN
1    2020-01-01  amazon    four    4.0      NaN
0    2020-01-01  amazon     one   17.0      NaN
351  2020-03-27  google    five    9.0      NaN
352  2020-03-27  google     six    8.0      NaN
353  2020-03-27  google     one  117.0  11700.0
426  2020-03-27  amazon   three   18.0      NaN
427  2020-03-27  amazon    four    1.0      NaN

如果条件很少,请使用numpy.select

m1 = (df['name']=='google') & (df['product'] == 'one')
m2 = (df['name']=='google') & (df['product'] == 'two')

df['price'] = np.select([m1, m2], [100, 150], default=np.nan) * df['items']
print (df)
           date    name product  items    price
0    2020-01-01  google     one  224.0  22400.0
2    2020-01-01  amazon     two    4.0      NaN
3    2020-01-01  amazon   three    8.0      NaN
1    2020-01-01  amazon    four    4.0      NaN
0    2020-01-01  amazon     one   17.0      NaN
351  2020-03-27  google    five    9.0      NaN
352  2020-03-27  google     six    8.0      NaN
353  2020-03-27  google     one  117.0  11700.0
426  2020-03-27  amazon   three   18.0      NaN
427  2020-03-27  amazon    four    1.0      NaN

如果可能存在许多编码,请使用左连接创建新的DataFrameDataFrame.merge

df1 = pd.DataFrame({  'name':['google','google','amazon'],
                   'product':['one','two','one'],
                      'mult':[100, 150, 50]})

df1 = df.merge(df1, on=['name','product'], how='left')
df1['price'] = df1['mult'] *  df1['items']
print (df1)
         date    name product  items   mult    price
0  2020-01-01  google     one  224.0  100.0  22400.0
1  2020-01-01  amazon     two    4.0    NaN      NaN
2  2020-01-01  amazon   three    8.0    NaN      NaN
3  2020-01-01  amazon    four    4.0    NaN      NaN
4  2020-01-01  amazon     one   17.0   50.0    850.0
5  2020-03-27  google    five    9.0    NaN      NaN
6  2020-03-27  google     six    8.0    NaN      NaN
7  2020-03-27  google     one  117.0  100.0  11700.0
8  2020-03-27  amazon   three   18.0    NaN      NaN
9  2020-03-27  amazon    four    1.0    NaN      NaN