此问题包含3个单独的数据帧。 df1代表产品1,2,3的'Total',包含'value1','value2' df2代表产品1,2,3的'Customer1',包含'value1','value2' df3代表产品1,2,3的'Customer2',包含'value1','value2'
df2& df3基本上是df1的子集。
我想创建另一个数据帧,从df1中减去df2& df3并标记此df4。我希望df4成为“市场”专栏中的“剩余客户”。
这是我到目前为止所做的事情
// POST: api/Orders
public void Post([FromBody]Order value)
{
//value already contains Lname, Fname, and OrderItem.
//Access it like this: value.Lname
}
这会产生以下结果..
import pandas as pd
d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3],
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1
d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1,
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2
d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1,
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3
要创建df4,我尝试以下代码并获得错误'TypeError:不支持的操作数类型 - :'str'和'str''任何人都可以帮忙吗?
Market Product Code Value1 Value2
0 Total 1 10 5
1 Total 2 20 15
2 Total 3 30 25
Market Product Code Value1 Value2
0 Customer1 1 3 2
1 Customer1 2 14 4
2 Customer1 3 10 6
Market Product Code Value1 Value2
0 Customer2 1 3 2
1 Customer2 2 3 6
2 Customer2 3 4 10
答案 0 :(得分:3)
删除Market
,将Product Code
设置为索引,并对产品代码执行索引对齐算术。之后,只需重置索引并将Market
插入到结果中。
df1, df2, df3 = [
df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]
df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')
Market Product Code Value1 Value2
0 RemainingCustomers 1 4 1
1 RemainingCustomers 2 3 5
2 RemainingCustomers 3 16 9
答案 1 :(得分:3)
不完全是OP所要求的,但在我看来,这可能是管理数据的更好方法。
df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])
formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df
Market Customer1 Customer2 Total RemainingCustomers
Value1 Value2 Value1 Value2 Value1 Value2 Value1 Value2
Product Code
1 3 2 3 2 10 5 4 1
2 14 4 3 6 20 15 3 5
3 10 6 4 10 30 25 16 9
和
df['RemainingCustomers']
Value1 Value2
Product Code
1 4 1
2 3 5
3 16 9
如果我们坚持要求的输出
df.stack(0).reset_index().query(
'Market == "RemainingCustomers"').reindex(columns=df1.columns)
Market Product Code Value1 Value2
2 RemainingCustomers 1 4 1
6 RemainingCustomers 2 3 5
10 RemainingCustomers 3 16 9
或者
df.stack(0).xs(
'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)
Market Product Code Value1 Value2
0 RemainingCustomers 1 4 1
1 RemainingCustomers 2 3 5
2 RemainingCustomers 3 16 9
答案 2 :(得分:2)
这是一种方式:
cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
.assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
.sort_index(axis=1)
# Market Product Code Value1 Value2
# 0 RemainingCustomers 1 4 1
# 1 RemainingCustomers 2 3 5
# 2 RemainingCustomers 3 16 9
<强>解释强>
df1[cols].subtract(df2[cols].add(df3[cols]))
仅对指定列执行计算。assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})
添加了结果数据框所需的额外列。sort_index(axis=1)
重新排序所需输出的列。答案 3 :(得分:2)
也许我们可以使用select_dtypes
(df1.select_dtypes(exclude = 'object')
-df2.select_dtypes(exclude = 'object')
-df3.select_dtypes(exclude = 'object')).\
drop('Product Code',1).\
combine_first(df1).\
assign(Market='remaining customers')
Out[133]:
Market Product Code Value1 Value2
0 remaining customers 1.0 4 1
1 remaining customers 2.0 3 5
2 remaining customers 3.0 16 9