从另一个数据帧中减去一个Pandas Dataframe中的属性值

时间:2018-02-19 03:28:38

标签: python python-3.x pandas

此问题包含3个单独的数据帧。 df1代表产品1,2,3的'Total',包含'value1','value2' df2代表产品1,2,3的'Customer1',包含'value1','value2' df3代表产品1,2,3的'Customer2',包含'value1','value2'

df2& df3基本上是df1的子集。

我想创建另一个数据帧,从df1中减去df2& df3并标记此df4。我希望df4成为“市场”专栏中的“剩余客户”。

这是我到目前为止所做的事情

// POST: api/Orders
public void Post([FromBody]Order value)
{
    //value already contains Lname, Fname, and OrderItem.
    //Access it like this: value.Lname
}

这会产生以下结果..

import pandas as pd


d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3], 
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1



d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1, 
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2


d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1, 
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3

要创建df4,我尝试以下代码并获得错误'TypeError:不支持的操作数类型 - :'str'和'str''任何人都可以帮忙吗?

Market  Product Code  Value1  Value2
0  Total             1      10       5
1  Total             2      20      15
2  Total             3      30      25
  Market  Product Code  Value1  Value2
0  Customer1             1       3       2
1  Customer1             2      14       4
2  Customer1             3      10       6
  Market  Product Code  Value1  Value2
0  Customer2             1       3       2
1  Customer2             2       3       6
2  Customer2             3       4      10

4 个答案:

答案 0 :(得分:3)

删除Market,将Product Code设置为索引,并对产品代码执行索引对齐算术。之后,只需重置索引并将Market插入到结果中。

df1, df2, df3 = [
      df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]

df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

答案 1 :(得分:3)

不完全是OP所要求的,但在我看来,这可能是管理数据的更好方法。

df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])

formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df

Market       Customer1        Customer2         Total        RemainingCustomers       
                Value1 Value2    Value1 Value2 Value1 Value2             Value1 Value2
Product Code                                                                          
1                    3      2         3      2     10      5                  4      1
2                   14      4         3      6     20     15                  3      5
3                   10      6         4     10     30     25                 16      9

df['RemainingCustomers']

              Value1  Value2
Product Code                
1                  4       1
2                  3       5
3                 16       9

如果我们坚持要求的输出

df.stack(0).reset_index().query(
    'Market == "RemainingCustomers"').reindex(columns=df1.columns)

                Market  Product Code  Value1  Value2
2   RemainingCustomers             1       4       1
6   RemainingCustomers             2       3       5
10  RemainingCustomers             3      16       9

或者

df.stack(0).xs(
    'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

答案 2 :(得分:2)

这是一种方式:

cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
               .assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
               .sort_index(axis=1)

#                Market  Product Code  Value1  Value2
# 0  RemainingCustomers             1       4       1
# 1  RemainingCustomers             2       3       5
# 2  RemainingCustomers             3      16       9

<强>解释

  • df1[cols].subtract(df2[cols].add(df3[cols]))仅对指定列执行计算。
  • assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})添加了结果数据框所需的额外列。
  • sort_index(axis=1)重新排序所需输出的列。

答案 3 :(得分:2)

也许我们可以使用select_dtypes

(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9