我有以下数据框:
import pandas as pd
from io import StringIO
data = StringIO("""TitleCode,ReleaseDate,WeekEnding,TotalUnits
A,12/16/2017,12/2/2017 0:00,5
A,12/16/2017,12/9/2017 0:00,10
A,12/16/2017,12/16/2017 0:00,2
A,12/16/2017,12/23/2017 0:00,5
A,12/16/2017,12/30/2017 0:00,4
B,1/6/2018,1/13/2017 0:00,4
B,1/6/2018,1/20/2017 0:00,2
""")
result = StringIO("""TitleCode,ReleaseDate,WeekEnding,TotalUnits
A,12/16/2017,12/16/2017 0:00,17
A,12/16/2017,12/23/2017 0:00,5
A,12/16/2017,12/30/2017 0:00,4
B,1/6/2018,1/13/2017 0:00,4
B,1/6/2018,1/13/2017 0:00,2
""")
datadf = pd.read_csv(data, parse_dates=True)
resultdf = pd.read_csv(result, parse_dates=True)
datadf
TitleCode ReleaseDate WeekEnding TotalUnits
0 A 12/16/2017 12/2/2017 0:00 5
1 A 12/16/2017 12/9/2017 0:00 10
2 A 12/16/2017 12/16/2017 0:00 2
3 A 12/16/2017 12/23/2017 0:00 5
4 A 12/16/2017 12/30/2017 0:00 4
5 B 1/6/2018 1/13/2017 0:00 4
6 B 1/6/2018 1/13/2017 0:00 2
resultdf
TitleCode ReleaseDate WeekEnding TotalUnits
0 A 12/16/2017 12/16/2017 0:00 17
1 A 12/16/2017 12/23/2017 0:00 5
2 A 12/16/2017 12/30/2017 0:00 4
3 B 1/6/2018 1/13/2017 0:00 4
4 B 1/6/2018 1/20/2017 0:00 2
datadf数据框按周显示项目销售额,以及项目的发布日期。我想将所有预售销售组合在一起,即在发布日期之前发生的销售(resultdf)。
我能想到的唯一方法就是循环数据框,但必须有一种更有效的方法。
谢谢!
答案 0 :(得分:1)
# standardize datetime format for comparison
datadf['WeekEnding'] = pd.to_datetime(datadf.WeekEnding, format='%m/%d/%Y %H:%M')
datadf['ReleaseDate'] = pd.to_datetime(datadf.ReleaseDate, format='%m/%d/%Y')
# replace weekending with release date if smaller
datadf['WeekEnding'] = datadf['WeekEnding'].where(
datadf['WeekEnding'] > datadf['ReleaseDate'], datadf['ReleaseDate']
)
datadf.groupby(
['TitleCode', 'ReleaseDate', 'WeekEnding']
).TotalUnits.sum().reset_index()