以某种方式我找不到适合我问题的解决方案。我想在相同的fact_date中使用相等的值计算Unit和Scen列之间的总和。输出应如下所示:
输出:
Combination Unit_Com Scen Value_Sum Town Country
11-Apr a,b 1 28 Town A USA
11-Apr a,b 2 31 Town A USA
11-Apr a,c 1 30 Town A USA
11-Apr a,c 2 30 Town A USA
11-Apr a,d 1 31 Town A USA
11-Apr a,d 2 29 Town A USA
11-Apr b,c 1 32 Town A USA
11-Apr b,c 2 39 Town A USA
11-Apr b,d 1 33 Town A USA
11-Apr b,d 2 38 Town A USA
11-Apr c,d 1 35 Town A USA
11-Apr c,d 2 37 Town A USA
10-Apr a,b 1 28 Town A USA
10-Apr a,b 2 25 Town A USA
10-Apr a,c 1 32 Town A USA
10-Apr a,c 2 26 Town A USA
10-Apr a,d 1 38 Town A USA
10-Apr a,d 2 22 Town A USA
10-Apr b,c 1 24 Town A USA
10-Apr b,c 2 27 Town A USA
10-Apr b,d 1 30 Town A USA
10-Apr b,d 2 23 Town A USA
10-Apr c,d 1 34 Town A USA
10-Apr c,d 2 24 Town A USA
按以下方式计算:
fact_date: 11-Apr
Town: Town A
Country: USA
Unit: a
Scen(Unit a): 1
Value: 13
Unit: b
Scen(Unit a): 1
Value: 15
**Output (as shown above):**
fact_date: 11-Apr
Unit_Combo: a,b
Scen: 1
Value_Sum: 28
Town: Town A
Country USA
然后应在每个事实日期执行此操作。
最后, Town A and Town B
与之之间的组合,例如a,e等
不幸的是,我没有收到任何组合,我被困在这里:
更新:
我更新了代码,但是仍然以某种方式接收到错误的输出
calculating date: 11-Apr
11-Apr 1,1 a,b Town A,Town A USA,USA 28
11-Apr 1,2 a,b Town A,Town A USA,USA 33
11-Apr 1,1 a,c Town A,Town A USA,USA 30
11-Apr 1,2 a,c Town A,Town A USA,USA 32
11-Apr 1,1 a,d Town A,Town A USA,USA 31
11-Apr 1,2 a,d Town A,Town A USA,USA 31
11-Apr 1,1 a,b Town A,Town A USA,USA 23
11-Apr 1,2 a,b Town A,Town A USA,USA 26
11-Apr 1,1 a,c Town A,Town A USA,USA 27
11-Apr 1,2 a,c Town A,Town A USA,USA 27
11-Apr 1,1 a,d Town A,Town A USA,USA 33
11-Apr 1,2 a,d Town A,Town A USA,USA 23
calculating date: 10-Apr
10-Apr 2,1 a,b Town A,Town A USA,USA 26
10-Apr 2,2 a,b Town A,Town A USA,USA 31
10-Apr 2,1 a,c Town A,Town A USA,USA 28
10-Apr 2,2 a,c Town A,Town A USA,USA 30
10-Apr 2,1 a,d Town A,Town A USA,USA 29
10-Apr 2,2 a,d Town A,Town A USA,USA 29
10-Apr 2,1 a,b Town A,Town A USA,USA 21
10-Apr 2,2 a,b Town A,Town A USA,USA 24
10-Apr 2,1 a,c Town A,Town A USA,USA 25
10-Apr 2,2 a,c Town A,Town A USA,USA 25
10-Apr 2,1 a,d Town A,Town A USA,USA 31
10-Apr 2,2 a,d Town A,Town A USA,USA 21
代码如下:
import pandas as pd
df = pd.DataFrame({'fact_date': ['11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr'],
'Unit': ['a','a','b','b','c','c','d','d','a','a','b','b','c','c','d','d','e','e','f','f','g','g','h','h','i','i','j','j','k','k','l','l'],
'Town': ['Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B'],
'Scen': [1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2],
'Value': [13,11,15,20,17,19,18,18,18,12,10,13,14,14,20,10,18,17,15,19,11,14,14,17,19,10,16,10,16,19,12,11],
'Country': ['USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA']})
test_df = pd.DataFrame([])
cluster_names = df['fact_date'].unique()
disjoint_clusters = []
for idx,item in enumerate(cluster_names):
df[df['fact_date'] == item]
print('calculating date: ' +str(item))
for j in range(idx+1, len(df)):
if df.iloc[idx]['Unit'] != df.iloc[j]['Unit'] and df.iloc[idx]['Town'] == 'Town A' and df.iloc[j]['Town'] == 'Town A':
print(item,
str(df.iloc[idx]['Scen'])+str(',')+str(df.iloc[j]['Scen']),
df.iloc[idx]['Unit']+str(',')+df.iloc[j]['Unit'],
df.iloc[idx]['Town']+str(',')+df.iloc[j]['Town'],
df.iloc[idx]['Country']+str(',')+df.iloc[j]['Country'],
df.iloc[idx]['Value']+df.iloc[j]['Value'])
答案 0 :(得分:0)
因此,这种方式将为您提供问题的预期输出。想法是在列groupby
上使用'fact_date','Country','Town','Scen'
,然后在分组数据帧上的combinations
中使用itertools
来填充'Value','Unit'
列中的值。您可以使用列表推导和pd.DataFrame
直接创建结果数据框:
from itertools import combinations
df_res = pd.DataFrame([list(name_g) + [val1+val2,'{},{}'.format(unit1,unit2)]
for name_g, df_g in df.groupby(['fact_date','Country','Town','Scen'])
for ((val1, unit1), (val2, unit2)) in combinations(df_g[['Value','Unit']].values,2)],
columns=['Combination','Country','Town','Scen','Value_Sum','Unit_Com'])
您可能需要对列进行重新排序并获得相同的输出范围,然后可以执行以下操作:
print (df_res[df_res['Town'] == 'Town A'])
Combination Country Town Scen Value_Sum Unit_Com
0 10-Apr USA Town A 1 28 a,b
1 10-Apr USA Town A 1 32 a,c
2 10-Apr USA Town A 1 38 a,d
3 10-Apr USA Town A 1 24 b,c
4 10-Apr USA Town A 1 30 b,d
5 10-Apr USA Town A 1 34 c,d
6 10-Apr USA Town A 2 25 a,b
7 10-Apr USA Town A 2 26 a,c
8 10-Apr USA Town A 2 22 a,d
9 10-Apr USA Town A 2 27 b,c
10 10-Apr USA Town A 2 23 b,d
11 10-Apr USA Town A 2 24 c,d
24 11-Apr USA Town A 1 28 a,b
25 11-Apr USA Town A 1 30 a,c
26 11-Apr USA Town A 1 31 a,d
27 11-Apr USA Town A 1 32 b,c
28 11-Apr USA Town A 1 33 b,d
29 11-Apr USA Town A 1 35 c,d
30 11-Apr USA Town A 2 31 a,b
31 11-Apr USA Town A 2 30 a,c
32 11-Apr USA Town A 2 29 a,d
33 11-Apr USA Town A 2 39 b,c
34 11-Apr USA Town A 2 38 b,d
35 11-Apr USA Town A 2 37 c,d
编辑:对于使用Town
做同样的事情,您可以这样做:
df_res = pd.DataFrame([list(name_g) + [val1+val2,'{},{}'.format(unit1,unit2), '{},{}'.format(town1,town2)]
for name_g, df_g in df.groupby(['fact_date','Country','Scen'])
for ((val1, unit1, town1), (val2, unit2, town2)) in combinations(df_g[['Value','Unit','Town']].values,2)],
columns=['Combination','Country','Scen','Value_Sum','Unit_Com','Town'])
看到的区别是,Town
列不再在groupby
中,而是在combinations
中选择的列中,并进行了一些小的改动以使其正常工作。
要随机选择这些组合,我建议您看一下函数sample
,例如,如果要使用其中的10种,可以执行以下操作:
print (df_res.sample(n=10))
Combination Country Scen Value_Sum Unit_Com Town
7 10-Apr USA 1 24 b,c Town A,Town A
66 11-Apr USA 1 30 b,f Town A,Town B
31 10-Apr USA 2 22 a,i Town A,Town B
18 10-Apr USA 1 39 d,i Town A,Town B
72 11-Apr USA 1 28 c,g Town A,Town B
109 11-Apr USA 2 33 f,g Town B,Town B
41 10-Apr USA 2 24 c,d Town A,Town A
99 11-Apr USA 2 38 c,f Town A,Town B
84 11-Apr USA 2 31 a,b Town A,Town A
88 11-Apr USA 2 30 a,f Town A,Town B