如下所示的数据框。
我想找出销售量何时> 20(在之前的5个数据中)库存是> 10的次数。
理想的输出是:
2018/12/26 has Sales 36 when 2 times.
2018/11/19 has Sales 34 when 2 times.
这是我对xlrd的处理方式:
import xlrd
from datetime import datetime
old_file = xlrd.open_workbook("C:\\Sales.xlsx")
the_sheet = old_file.sheet_by_name("Sales")
for row_index in range(1, the_sheet.nrows):
Dates = the_sheet.cell(row_index, 0).value
Inventory = the_sheet.cell(row_index, 1).value
Sales = the_sheet.cell(row_index, 2).value
list_of_Inventory = []
for i in range(1,5):
list_of_Inventory.append(the_sheet.cell(row_index - i, 1).value)
if Sales > 20:
print str(Dates) + " has Sales " + str(Sales) + " when " + str(sum(i > 10 for i in list_of_Inventory)) + " times."
效果不好。
解决该问题的正确方法是什么?赞赏熊猫的一些指导。
谢谢。
p.s。这是数据。
data = {'Date': ["2018/12/29","2018/12/26","2018/12/24","2018/12/15","2018/12/11","2018/12/8","2018/11/28","2018/11/20","2018/11/19","2018/11/11","2018/11/6","2018/11/1","2018/10/28","2018/10/11","2018/9/25","2018/9/24"],
'Inventory': [5,5,5,22,5,25,5,15,15,5,5,15,0,22,2,10],
'Sales' : [0,36,18,0,0,17,18,17,34,16,0,0,18,18,51,18]}
df = pd.DataFrame(data)
答案 0 :(得分:2)
我认为您不会绕过数据框进行迭代(基于输出的详细信息)。因此,只要您的数据不是很大,就不成问题。这是您可以实施的另一种快速解决方案:
for idx in df.loc[df.Sales > 20].index:
inv = df.loc[idx-4:idx, 'Inventory'].ge(10)
date, _, sales = df.loc[idx]
if len(inv) >= 5:
print(f'{date} has Sales {sales} when {inv.sum()} times')
2018/11/19 has Sales 34 when 2 times
2018/9/25 has Sales 51 when 2 times
答案 1 :(得分:1)
我认为您可以通过使用熊猫rolling
函数使用几个“骗子”列来完成一些中间工作。注意'HSHIC'=高销售高库存盘点。 (需要一个缩写)。实际上,这对于您希望排除前4行非常有效,因为rolling
会自动排除它们。
In [42]: df = pd.DataFrame(data)
In [43]: df
Out[43]:
Date Inventory Sales
0 2018/12/29 5 0
1 2018/12/26 5 36
2 2018/12/24 5 18
3 2018/12/15 6 0
4 2018/12/11 5 0
5 2018/12/8 0 17
6 2018/11/28 5 18
7 2018/11/20 15 17
8 2018/11/19 15 34
9 2018/11/11 5 16
10 2018/11/6 5 0
11 2018/11/1 15 0
12 2018/10/28 0 18
13 2018/10/11 10 18
14 2018/9/25 2 51
15 2018/9/24 10 18
In [44]: df['High Inventory'] = df['Inventory'] > 10
In [45]: df['High Inv Cnt'] = df['High Inventory'].rolling(window=5).sum()
In [46]: df
Out[46]:
Date Inventory Sales High Inventory High Inv Cnt
0 2018/12/29 5 0 False NaN
1 2018/12/26 5 36 False NaN
2 2018/12/24 5 18 False NaN
3 2018/12/15 6 0 False NaN
4 2018/12/11 5 0 False 0.0
5 2018/12/8 0 17 False 0.0
6 2018/11/28 5 18 False 0.0
7 2018/11/20 15 17 True 1.0
8 2018/11/19 15 34 True 2.0
9 2018/11/11 5 16 False 2.0
10 2018/11/6 5 0 False 2.0
11 2018/11/1 15 0 True 3.0
12 2018/10/28 0 18 False 2.0
13 2018/10/11 10 18 False 1.0
14 2018/9/25 2 51 False 1.0
15 2018/9/24 10 18 False 1.0
In [47]: df['HSHIC'] = df['High Inv Cnt'][df.Sales > 20]
In [48]: df
Out[48]:
Date Inventory Sales High Inventory High Inv Cnt HSHIC
0 2018/12/29 5 0 False NaN NaN
1 2018/12/26 5 36 False NaN NaN
2 2018/12/24 5 18 False NaN NaN
3 2018/12/15 6 0 False NaN NaN
4 2018/12/11 5 0 False 0.0 NaN
5 2018/12/8 0 17 False 0.0 NaN
6 2018/11/28 5 18 False 0.0 NaN
7 2018/11/20 15 17 True 1.0 NaN
8 2018/11/19 15 34 True 2.0 2.0
9 2018/11/11 5 16 False 2.0 NaN
10 2018/11/6 5 0 False 2.0 NaN
11 2018/11/1 15 0 True 3.0 NaN
12 2018/10/28 0 18 False 2.0 NaN
13 2018/10/11 10 18 False 1.0 NaN
14 2018/9/25 2 51 False 1.0 1.0
15 2018/9/24 10 18 False 1.0 NaN
In [49]:
答案 2 :(得分:0)
问题的第一篇文章中出现错误(现在页面上的内容是正确的),所以让我提出一个使用Python 2的可行解决方案。
感谢@manwithfewneeds和@kantal。
for idx in df.index[df.Sales > 20]:
inv = df.loc[idx + 1 : idx + 5, 'Inventory'].ge(10) # downwards 5 rows, Inventory > 10
date, _, sales = df.loc[idx]
if len(inv) >= 5:
print '%s. has Sales %s. when %s. times' % (date, sales, inv.sum())