我们说我有一张销售交易表,其中一些是单独的SKU和一些捆绑的SKU。
MigLayout
包含Bundle-component组合的单独表格:
Date, Product, Qty
1 Jan 2017, A, 10
2 Jan 2017, Bundle X, 5
3 Jan 2017, B, 10
4 Jan 2017, Bundle Y, 5
如何定义要在销售交易表中应用的功能(或使用for-loop),以便最终产品将具有捆绑SKU的行与SKU分成多行?最终结果应如下所示:
ParentSKU, ComponentSKU, Quantity
Bundle X, A, 3
Bundle X, B, 5
Bundle X, C, 10
Bundle Y, P, 5
Bundle Y, Q, 7
Bundle Y, R, 12
Bundle Y, S, 3
谢谢!
答案 0 :(得分:2)
这是使用numpy
和itertools
的一种方式。
<强>设置强>
import pandas as pd, numpy as np
from itertools import chain
# SETUP
df1 = pd.DataFrame({'Date': ['Jan 2017', 'Jan 2017', 'Jan 2017', 'Jan 2017'],
'Product': ['A', 'Bundle X', 'B', 'Bundle Y'],
'Qty': [10 , 5, 10, 5]})
df2 = pd.DataFrame({'ParentSKU': ['Bundle X', 'Bundle X' ,'Bundle X', 'Bundle Y',
'Bundle Y', 'Bundle Y', 'Bundle Y'],
'ComponentSKU': ['A', 'B', 'C', 'P', 'Q', 'R', 'S'],
'Quantity': [3, 5, 10, 5, 7, 12, 3]})
<强>解决方案强>
# Perform groupby on bundles
bundles = df2.groupby('ParentSKU')['ComponentSKU'].apply(list)
bundles_q = df2.groupby('ParentSKU')['Quantity'].apply(list)
# Map bundles to df1
df1['Product_Decomposed'] = df1['Product'].map(bundles).fillna(df1['Product'].apply(list))
df1['Quantity_Decomposed'] = df1.apply(lambda x: [x['Qty']*i for i in bundles_q.get(x['Product'], [1])], axis=1)
# Get lengths of each bundle
lens = list(map(len, df1['Product_Decomposed']))
# Create dataframe by repeating and chaining data
res = pd.DataFrame({'Date': np.repeat(df1['Date'], lens),
'Product': list(chain.from_iterable(df1['Product_Decomposed'])),
'Qty': list(chain.from_iterable(df1['Quantity_Decomposed']))})
<强>结果强>
print(res)
Date Product Qty
0 Jan 2017 A 10
1 Jan 2017 A 15
1 Jan 2017 B 25
1 Jan 2017 C 50
2 Jan 2017 B 10
3 Jan 2017 P 25
3 Jan 2017 Q 35
3 Jan 2017 R 60
3 Jan 2017 S 15
答案 1 :(得分:0)
一种方法是使用merge
(使用jpp方便的df1和df2设置):
# Split df1 into the ones we need to unbundle
by_bundling = dict(list(df1.groupby(df1.Product.str.startswith("Bundle"))))
# Select the ones we want to unbundle, and make the index a column
unbundled = by_bundling[True].reset_index()
# Merge this with our second table
unbundled = unbundled.merge(df2, left_on="Product", right_on="ParentSKU")
# Multiply the quantities
unbundled["Qty"] *= unbundled["Quantity"]
# Reduce to the columns of interest and rename
unbundled = unbundled.set_index("index")[["Date", "ComponentSKU", "Qty"]]
unbundled = unbundled.rename(columns={"ComponentSKU": "Product"})
# Recombine and sort
final = pd.concat([by_bundling[False], unbundled]).sort_index()
给了我
In [57]: final
Out[57]:
Date Product Qty
0 Jan 2017 A 10
1 Jan 2017 A 15
1 Jan 2017 B 25
1 Jan 2017 C 50
2 Jan 2017 B 10
3 Jan 2017 P 25
3 Jan 2017 Q 35
3 Jan 2017 R 60
3 Jan 2017 S 15
这里唯一有趣的事情是合并:
In [59]: unbundled.merge(df2, left_on="Product", right_on="ParentSKU")
Out[59]:
index Date Product Qty ComponentSKU ParentSKU Quantity
0 1 Jan 2017 Bundle X 5 A Bundle X 3
1 1 Jan 2017 Bundle X 5 B Bundle X 5
2 1 Jan 2017 Bundle X 5 C Bundle X 10
3 3 Jan 2017 Bundle Y 5 P Bundle Y 5
4 3 Jan 2017 Bundle Y 5 Q Bundle Y 7
5 3 Jan 2017 Bundle Y 5 R Bundle Y 12
6 3 Jan 2017 Bundle Y 5 S Bundle Y 3
其余的只是重新排列和算术。
不要小看手动的做事方式 - 它们有时是最简单的,你可以遵循的干净的代码比你不能的“聪明”代码要好得多。