我有一个包含各种产品,位置,Licence_ends的文件,我需要计算有多少产品按日期耗尽许可证,以及在该季度可以重新订购的数量,下面的示例数据:
List<String> cheeses = Arrays.asList("Gouda", "Edam");
String x= "Gouda";
String type = cheeses.contains(x) ? "Gouda".equals(x) ? "Yummy Gouda" : "Cheese - but not Gouda" : "Maybe not dairy";
我想要实现的目标如下:
Item Store Category Licence_ends Available_to_reorder
0 A01929 North Office 2018 Q1 Yes
1 A02911 South Windows 2019 Q3 Yes
2 B11282 North Adobe 2019 Q2 No
3 C73162 East Office 2018 Q4 Yes
4 A12817 West Windows 2020 Q1 No
我从下面的代码开始,但我迷路了,不知道正确的方法:
Store Category 2018 Q1 2018 Q2 ... 2020 Q4
0 East Windows 0 1 24 # cumulative sum of previous quarters
1 East Office 1 2 11
2 East Adobe 1 4 6
3 West Windows 2 2 18
4 West Office 0 0 0
...
11 South Adobe 1 0 12
12 Total All col.sum() col.sum() col.sum()
这是我正在制作的,但仅限于最后一家商店:
分别为每个类别。我尝试添加列表,系列,字典到空数据帧,我尝试追加,添加,分配,并没有得到我想要的。你能指点我到正确的方向吗?
我在SO中经历了大多数方法,并且在Wes Kinley的书中看到了@SaféBooks,但是无法登陆它。请帮忙。我必须在星期一之前完成,而且我绝对没有。
答案 0 :(得分:2)
使用 aggfunc 参数中的lambda
,使用条件逻辑和考虑pivot_table
。下面演示随机数据,播种的重现性,当然还要添加开源类别。
数据强>
import numpy as np
import pandas as pd
np.random.seed(22)
LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
df = pd.DataFrame({'Item': ["".join(list(np.random.choice(LETTERS,1)) +
[str(np.random.randint(1000, 9000))]) for _ in range(500)],
'Store': [np.random.choice(['North', 'South',
'East', 'West'],1).item(0) for _ in range(500)],
'Category': [np.random.choice(['Office', 'Windows',
'Adobe', 'Open Source'],1).item(0) for _ in range(500)],
'Licence_ends': ["Q".join([str(np.random.randint(2018, 2021))] +
[str(np.random.randint(1,4))]) for _ in range(500)],
'Available_to_reorder': [np.random.choice(['Yes', 'No'],1).item(0) for _ in range(500)]},
columns = ['Item', 'Store', 'Category', 'Licence_ends', 'Available_to_reorder'])
print(df.head())
# Item Store Category Licence_ends Available_to_reorder
# 0 V7276 West Open Source 2018Q2 Yes
# 1 M8104 West Windows 2020Q1 No
# 2 E6478 North Open Source 2019Q2 No
# 3 W5587 South Open Source 2018Q2 Yes
# 4 U3952 South Windows 2019Q3 No
# 5 E1989 East Office 2018Q1 No
# 6 S6646 West Windows 2019Q2 Yes
# 7 N7616 West Adobe 2019Q1 Yes
# 8 H6410 East Adobe 2020Q2 No
# 9 J8176 West Office 2020Q1 Yes
数据透视表 (结果为多索引数据框)
pvt_df = df.pivot_table(index=['Store', 'Category'], columns='Licence_ends', values='Available_to_reorder',
aggfunc = lambda x: sum(x=='Yes'), margins=True, margins_name='Total')
print(pvt_df)
# Licence_ends 2018Q1 2018Q2 2018Q3 2019Q1 2019Q2 2019Q3 2020Q1 2020Q2 2020Q3 Total
# Store Category
# East Adobe 3.0 0.0 1.0 0.0 3.0 2.0 1.0 4.0 0.0 14
# Office 1.0 3.0 4.0 2.0 NaN 4.0 1.0 1.0 1.0 17
# Open Source 1.0 4.0 2.0 0.0 1.0 0.0 1.0 2.0 1.0 12
# Windows 1.0 2.0 3.0 1.0 1.0 0.0 1.0 3.0 1.0 13
# North Adobe 3.0 4.0 1.0 1.0 1.0 1.0 3.0 0.0 2.0 16
# Office 1.0 0.0 3.0 0.0 1.0 2.0 3.0 0.0 0.0 10
# Open Source 3.0 1.0 0.0 1.0 1.0 2.0 2.0 1.0 2.0 13
# Windows 2.0 2.0 5.0 0.0 2.0 2.0 1.0 1.0 3.0 18
# South Adobe 2.0 3.0 NaN 2.0 2.0 3.0 1.0 3.0 2.0 18
# Office 4.0 3.0 1.0 2.0 NaN 2.0 3.0 2.0 2.0 19
# Open Source 1.0 2.0 2.0 4.0 1.0 NaN NaN 3.0 2.0 15
# Windows 2.0 1.0 1.0 2.0 2.0 2.0 1.0 3.0 1.0 15
# West Adobe 1.0 1.0 0.0 4.0 3.0 3.0 1.0 0.0 3.0 16
# Office 1.0 1.0 3.0 3.0 3.0 2.0 2.0 2.0 1.0 18
# Open Source 4.0 2.0 4.0 0.0 0.0 4.0 1.0 1.0 2.0 18
# Windows 2.0 2.0 1.0 5.0 4.0 1.0 4.0 1.0 0.0 20
# Total 32.0 31.0 31.0 27.0 25.0 30.0 26.0 27.0 23.0 252