我有一个复杂的问题,但我会尽量详细地解释它。我有以下 2 个数据帧,我需要做一些比较并将差异放在另一个数据帧中。比较标准如下所示。
initial = pd.DataFrame({'ProductID': ['123', '456', '789', '000','231'],
'ProductName': ['Apple','Pear','Mango','Banana','Jackfruit'],
'DiscountedPrice': ['0.99', '1.00', '1.50', '2.10','2.35'],
'DiscountStartDate': ['30/01/2020', '21/06/2020', '01/01/2020', '10/11/2020','05/05/2020'],
'DiscountEndDate': ['25/03/2020', '30/07/2020', '30/01/2020', '12/12/2020','06/06/2020']})
updated = pd.DataFrame({'ProductID': ['123', '456', '789', '000','231'],
'ProductName': ['Apple','Pear','Mango','Banana','Jackfruit'],
'DiscountedPrice': ['0.53', '1.00', '0.99', '2.00','2.35'],
'DiscountStartDate': ['30/01/2020', '21/06/2020', '15/01/2020', '30/11/2020','09/10/2020'],
'DiscountEndDate': ['25/03/2020', '30/07/2020', '30/01/2020', '12/12/2020','31/10/2020']})
比较标准是;
(1) 如果两个数据帧上的折扣价和开始/结束日期相同,则忽略。
(2) 如果折扣价格相同但开始/结束日期不同,我需要将两个条目都放入我的“更改”数据框中
(3) 如果两个数据框的折扣价不同但开始和结束日期相同,我需要将“更新”数据框中的 DiscountedPrice 和开始/结束日期放入我的“更改”数据框中>
(4) 如果折扣价格不同并且它们的开始/结束日期以某种方式重叠,我需要将初始的结束日期调整为更新开始日期的 -1 并将两个条目都纳入我的“更改” '数据框
基本上,'changes' 数据帧输出必须如下表所示。
产品ID | 产品名称 | 折扣价 | 折扣开始日期 | 折扣结束日期 |
---|---|---|---|---|
123 | 苹果 | 0.53 | 30/01/2020 | 25/03/2020 |
789 | 芒果 | 1.50 | 01/01/2020 | 14/01/2020 |
789 | 芒果 | 0.99 | 15/01/2020 | 30/01/2020 |
000 | 香蕉 | 2.10 | 10/11/2020 | 29/11/2020 |
000 | 香蕉 | 2.00 | 30/11/2020 | 12/12/2020 |
231 | 菠萝蜜 | 2.35 | 05/05/2020 | 06/06/2020 |
231 | 菠萝蜜 | 2.35 | 09/10/2020 | 31/10/2020 |
有人可以帮我吗?
答案 0 :(得分:1)
合并两个数据帧,以便可以应用逻辑来识别所有四种情况。确定案例后,可以修改日期并将结果串联在一起。为了透明起见,添加了更改数据框的情况。
initial = pd.DataFrame({'ProductID': ['123', '456', '789', '000','231'],
'ProductName': ['Apple','Pear','Mango','Banana','Jackfruit'],
'DiscountedPrice': ['0.99', '1.00', '1.50', '2.10','2.35'],
'DiscountStartDate': ['30/01/2020', '21/06/2020', '01/01/2020', '10/11/2020','05/05/2020'],
'DiscountEndDate': ['25/03/2020', '30/07/2020', '30/01/2020', '12/12/2020','06/06/2020']})
updated = pd.DataFrame({'ProductID': ['123', '456', '789', '000','231'],
'ProductName': ['Apple','Pear','Mango','Banana','Jackfruit'],
'DiscountedPrice': ['0.53', '1.00', '0.99', '2.00','2.35'],
'DiscountStartDate': ['30/01/2020', '21/06/2020', '15/01/2020', '30/11/2020','09/10/2020'],
'DiscountEndDate': ['25/03/2020', '30/07/2020', '30/01/2020', '12/12/2020','31/10/2020']})
initial["DiscountStartDate"] = pd.to_datetime(initial["DiscountStartDate"])
initial["DiscountEndDate"] = pd.to_datetime(initial["DiscountEndDate"])
updated["DiscountStartDate"] = pd.to_datetime(updated["DiscountStartDate"])
updated["DiscountEndDate"] = pd.to_datetime(updated["DiscountEndDate"])
# merge two dataframes so that values can be identified
dfcat = (initial
.merge(updated, on=["ProductID"], suffixes=("_i","_u"))
# cascading logic to mark which each of the 4 cases
.assign(cat=lambda dfa: np.where(dfa["DiscountStartDate_i"].eq(dfa["DiscountStartDate_u"])
&dfa["DiscountEndDate_i"].eq(dfa["DiscountEndDate_u"])
&dfa["DiscountedPrice_i"].eq(dfa["DiscountedPrice_u"])
,"case1",
# no need to check dates different - done in case1
np.where(dfa["DiscountedPrice_i"].eq(dfa["DiscountedPrice_u"])
,"case2",
np.where(dfa["DiscountEndDate_i"].eq(dfa["DiscountEndDate_u"])
&dfa["DiscountStartDate_i"].eq(dfa["DiscountStartDate_u"])
,"case3", "case4")))
# case 4, modify EndDate
,DiscountEndDate_i=lambda dfa: np.where(dfa["cat"].eq("case4"),
dfa["DiscountStartDate_u"] - pd.to_timedelta(1,unit="d"),
dfa["DiscountEndDate_i"])
))
# utility to filter data and rename columns for each of the cases
def chngrows(df, case, ind):
return (df
.query(f"cat.isin(['{case}'])")
.loc[:,["ProductID"]+[c for c in dfcat.columns if ind in c]]
.rename(columns={c:c.replace(ind,"") for c in dfcat.columns if ind in c})
.assign(cat=f"{case}{ind}")
)
changes = pd.concat([
chngrows(dfcat, "case2", "_i"),
chngrows(dfcat, "case2", "_u"),
chngrows(dfcat, "case3", "_u"),
chngrows(dfcat, "case4", "_i"),
chngrows(dfcat, "case4", "_u"),
]).sort_values(["ProductID","cat"])
ProductID ProductName DiscountedPrice DiscountStartDate DiscountEndDate cat
000 Banana 2.10 2020-10-11 2020-11-29 case4_i
000 Banana 2.00 2020-11-30 2020-12-12 case4_u
123 Apple 0.53 2020-01-30 2020-03-25 case3_u
231 Jackfruit 2.35 2020-05-05 2020-06-06 case2_i
231 Jackfruit 2.35 2020-09-10 2020-10-31 case2_u
789 Mango 1.50 2020-01-01 2020-01-14 case4_i
789 Mango 0.99 2020-01-15 2020-01-30 case4_u