我有一个结构复杂的原始数据,我想通过将月份名称添加为索引来将其从多列简化为单个观测值。 感谢您在此方面的帮助。
Source Data
SKU D01 D02 D03 ------- D11 J01 J02 J03----- J11 P01 P02 P03------ P011
ABC 5 6 8 ------- 10 0 1 2 ----- 10 0 0 0 ------ 0
BXY 10 11 20 ------- 8 0 0 0 ------ 0 5 8 10 ------ 12
在您的帮助下,我能够转换需求列,现在我想增加更多列,以使Jobs和PO与需求保持同步。
下面是我尝试过的代码。
rs=con.execute("""Select SKU, D00, Demand01, Demand02,
D03, D04, D05, D06, D07, D08, D09,
10, 11 from ForecastData""")
df= pd.DataFrame(rs.fetchall())
df.columns = ["SKU", "D01","D02", "D03", "D04",
"D05", "D06","D07", "D08", "D09", "D10",
"D11", "D12"]
df.set_index('StockCode')
demand_columns=[i for i in df.columns if i.startswith('Demand')]
today=pd.Timestamp.now()
month_list=[(today+pd.DateOffset(months=i)) for i in
range(len(demand_columns))]
dic_month={col:month for col,month in zip(demand_columns,month_list)}
df.rename(columns=dic_month)
df2 = pd.DataFrame(df.rename(columns=dict(zip(demand_columns,month_list))).set_
index('StockCode').stack()).reset_index()
df2.columns = ['StockCode', 'Month', 'Demand']
df2['Month'] = pd.to_datetime(df2['Month'], format = '%Y%m').dt.date
Output
StockCode Month Demand
ABC 2019-04-01 5
ABC 2019-05-01 6
ABC 2019-06-01 8
-
-
ABC 2020-03-01 10
BXY 2019-04-01 10
BXY 2019-05-01 11
BXY 2019-06-01 20
-
-
BXY 2020-03-01 8
Desired Output with Multiple Columns
StockCode Month Demand Job Po
ABC 2019-04-01 5 0 0
ABC 2019-05-01 6 1 0
ABC 2019-06-01 8 2 0
-
-
ABC 2020-03-01 10 10 0
BXY 2019-04-01 10 0 5
BXY 2019-05-01 11 0 8
BXY 2019-06-01 20 0 10
-
-
BXY 2020-03-01 8 0 12