示例数据框:
df = pd.DataFrame({'sales': ['2020-01','2020-02','2020-03','2020-04','2020-05','2020-06'],
'2020-01': [24,42,18,68,24,30],
'2020-02': [24,42,18,68,24,30],
'2020-03': [64,24,70,70,88,57],
'2020-04': [22,11,44,3,5,78],
'2020-05': [11,35,74,12,69,51]}
我想在下面找到df['L2']
我学过pandas roll、groupby等,解决不了。
请阅读L2公式并给我意见
L2 公式
L2(Jan-20) = 24
-------------------
sales 2020-01
0 2020-01 24
-------------------
L2(Feb-20) = 132 (sum of below matrix 2x2)
sales 2020-01 2020-02
0 2020-01 24 24
1 2020-02 42 42
-------------------
L2(Mar-20) = 154 (sum of matrix 2x2)
sales 2020-02 2020-03
0 2020-02 42 24
1 2020-03 18 70
-------------------
L2(Apr-20) = 187 (sum of below maxtrix 2x2)
sales 2020-03 2020-04
0 2020-03 70 44
1 2020-04 70 3
输出
Unnamed: 0 sales Jan-20 Feb-20 Mar-20 Apr-20 May-20 L2 L3
0 0 Jan-20 24 24 64 22 11 24 24
1 1 Feb-20 42 42 24 11 35 132 132
2 2 Mar-20 18 18 70 44 74 154 326
3 3 Apr-20 68 68 70 3 12 187 350
4 4 May-20 24 24 88 5 69 89 545
5 5 Jun-20 30 30 57 78 51 203 433
答案 0 :(得分:0)
import pandas as pd
import numpy as np
# make a dataset
df = pd.DataFrame({'sales': ['2020-01','2020-02','2020-03','2020-04','2020-05','2020-06'],
'2020-01': [24,42,18,68,24,30],
'2020-02': [24,42,18,68,24,30],
'2020-03': [64,24,70,70,88,57],
'2020-04': [22,11,44,3,5,78],
'2020-05': [11,35,74,12,69,51]})
print(df)
# datawork(L2)
for i in range(0,df.shape[0]):
if i==0:
df.loc[i,'L2']=df.loc[i,'2020-01']
else:
if i!=df.shape[0]-1:
df.loc[i,'L2']=df.iloc[i-1:i+1,i:i+2].sum().sum()
if i==df.shape[0]-1:
df.loc[i,'L2']=df.iloc[i-1:i+1,i-1:i+1].sum().sum()
print(df)
# sales 2020-01 2020-02 2020-03 2020-04 2020-05 L2
#0 2020-01 24 24 64 22 11 24.0
#1 2020-02 42 42 24 11 35 132.0
#2 2020-03 18 18 70 44 74 154.0
#3 2020-04 68 68 70 3 12 187.0
#4 2020-05 24 24 88 5 69 89.0
#5 2020-06 30 30 57 78 51 203.0
答案 1 :(得分:0)
Values=f.values[:,1:]
L2=[]
RANGE=Values.shape[0]
for a in range(RANGE):
if a==0:
result=Values[a,a]
else:
if Values[a-1:a+1,a-1:a+1].shape==(2,1):
result=np.sum(Values[a-1:a+1,a-2:a])
else:
result=np.sum(Values[a-1:a+1,a-1:a+1])
L2.append(result)
print(L2)
L2 output:-->[24, 132, 154, 187, 89, 203]
f["L2"]=L2
答案 2 :(得分:0)
我尝试了另一种方法。 这个方法使用了reshape long(在python中:melt),但是我在python中应用了reshape long两次,因为df中销售和其他列的时间频率是每月而不是每天,所以我再一次reshape long使int列对应于每月日期。 (我用Stata的次数比python多,在Stata中,我只能做一次reshape很长时间,因为它有每月的时间频率,而且reshape任务比pandas,python要容易得多)
有兴趣的可以看看
# 00.module
import pandas as pd
import numpy as np
from order import order # https://stackoverflow.com/a/68464246/16478699
# 0.make a dataset
df = pd.DataFrame({'sales': ['2020-01', '2020-02', '2020-03', '2020-04', '2020-05', '2020-06'],
'2020-01': [24, 42, 18, 68, 24, 30],
'2020-02': [24, 42, 18, 68, 24, 30],
'2020-03': [64, 24, 70, 70, 88, 57],
'2020-04': [22, 11, 44, 3, 5, 78],
'2020-05': [11, 35, 74, 12, 69, 51]}
)
df.to_stata('dataset.dta', version=119, write_index=False)
print(df)
# 1.reshape long(in python: melt)
t = list(df.columns)
t.remove('sales')
df_long = df.melt(id_vars='sales', value_vars=t, var_name='var', value_name='val')
df_long['id'] = list(range(1, df_long.shape[0] + 1)) # make id for another resape long
print(df_long)
# 2.another reshape long(in python: melt, reason: make int(col name: tid) corresponding to monthly date of sales and monthly columns in df)
df_long2 = df_long.melt(id_vars=['id', 'val'], value_vars=['sales', 'var'])
df_long2['tid'] = df_long2['value'].apply(lambda x: 1 + list(df_long2.value.unique()).index(x))
print(df_long2)
# 3.back to wide form with tid(in python: pd.pivot)
df_wide = pd.pivot(df_long2, index=['id', 'val'], columns='variable', values=['value', 'tid'])
df_wide.columns = df_wide.columns.map(lambda x: x[1] if x[0] == 'value' else f'{x[0]}_{x[1]}') # change multiindex columns name into just normal columns name
df_wide = df_wide.reset_index()
print(df_wide)
# 4.make values of L2
for i in df_wide.tid_sales.unique():
if list(df_wide.tid_sales.unique()).index(i) + 1 == len(df_wide.tid_sales.unique()):
df_wide.loc[df_wide['tid_sales'] == i, 'L2'] = df_wide.loc[(((df_wide['tid_sales'] == i) | (
df_wide['tid_sales'] == i - 1)) & ((df_wide['tid_var'] == i - 1) | (
df_wide['tid_var'] == i - 2))), 'val'].sum()
else:
df_wide.loc[df_wide['tid_sales'] == i, 'L2'] = df_wide.loc[(((df_wide['tid_sales'] == i) | (
df_wide['tid_sales'] == i - 1)) & ((df_wide['tid_var'] == i) | (
df_wide['tid_var'] == i - 1))), 'val'].sum()
print(df_wide)
# 5.back to shape of df with L2(reshape wide, in python: pd.pivot)
df_final = df_wide.drop(columns=df.filter(regex='^tid')) # no more columns starting with tid needed
df_final = pd.pivot(df_final, index=['sales', 'L2'], columns='var', values='val').reset_index()
df_final = order(df_final, 'L2', f_or_l='last') # order function is made by me
print(df_final)