数据框以计算条件发生

时间:2019-10-18 23:16:06

标签: python pandas dataframe

如下所示的数据框。

enter image description here

我想找出销售量何时> 20(在之前的5个数据中)库存是> 10的次数。

理想的输出是:

2018/12/26 has Sales 36 when 2 times.
2018/11/19 has Sales 34 when 2 times.

这是我对xlrd的处理方式:

import xlrd
from datetime import datetime

old_file = xlrd.open_workbook("C:\\Sales.xlsx")
the_sheet = old_file.sheet_by_name("Sales")

for row_index in range(1, the_sheet.nrows):
    Dates = the_sheet.cell(row_index, 0).value
    Inventory = the_sheet.cell(row_index, 1).value
    Sales = the_sheet.cell(row_index, 2).value

    list_of_Inventory = []

    for i in range(1,5):
        list_of_Inventory.append(the_sheet.cell(row_index - i, 1).value)

    if Sales > 20:
        print str(Dates) + " has Sales " + str(Sales) + " when " + str(sum(i > 10 for i in list_of_Inventory)) + " times."

效果不好。

解决该问题的正确方法是什么?赞赏熊猫的一些指导。

谢谢。

p.s。这是数据。

data = {'Date':     ["2018/12/29","2018/12/26","2018/12/24","2018/12/15","2018/12/11","2018/12/8","2018/11/28","2018/11/20","2018/11/19","2018/11/11","2018/11/6","2018/11/1","2018/10/28","2018/10/11","2018/9/25","2018/9/24"], 
'Inventory': [5,5,5,22,5,25,5,15,15,5,5,15,0,22,2,10],
'Sales' : [0,36,18,0,0,17,18,17,34,16,0,0,18,18,51,18]}

df = pd.DataFrame(data)

3 个答案:

答案 0 :(得分:2)

我认为您不会绕过数据框进行迭代(基于输出的详细信息)。因此,只要您的数据不是很大,就不成问题。这是您可以实施的另一种快速解决方案:

for idx in df.loc[df.Sales > 20].index:
    inv = df.loc[idx-4:idx, 'Inventory'].ge(10)
    date, _, sales = df.loc[idx]
    if len(inv) >= 5:
        print(f'{date} has Sales {sales} when {inv.sum()} times')

2018/11/19 has Sales 34 when 2 times
2018/9/25 has Sales 51 when 2 times

答案 1 :(得分:1)

我认为您可以通过使用熊猫rolling函数使用几个“骗子”列来完成一些中间工作。注意'HSHIC'=高销售高库存盘点。 (需要一个缩写)。实际上,这对于您希望排除前4行非常有效,因为rolling会自动排除它们。

In [42]: df = pd.DataFrame(data)                                                 

In [43]: df                                                                      
Out[43]: 
          Date  Inventory  Sales
0   2018/12/29          5      0
1   2018/12/26          5     36
2   2018/12/24          5     18
3   2018/12/15          6      0
4   2018/12/11          5      0
5    2018/12/8          0     17
6   2018/11/28          5     18
7   2018/11/20         15     17
8   2018/11/19         15     34
9   2018/11/11          5     16
10   2018/11/6          5      0
11   2018/11/1         15      0
12  2018/10/28          0     18
13  2018/10/11         10     18
14   2018/9/25          2     51
15   2018/9/24         10     18

In [44]: df['High Inventory'] = df['Inventory'] > 10                             

In [45]: df['High Inv Cnt'] = df['High Inventory'].rolling(window=5).sum()       

In [46]: df                                                                      
Out[46]: 
          Date  Inventory  Sales  High Inventory  High Inv Cnt
0   2018/12/29          5      0           False           NaN
1   2018/12/26          5     36           False           NaN
2   2018/12/24          5     18           False           NaN
3   2018/12/15          6      0           False           NaN
4   2018/12/11          5      0           False           0.0
5    2018/12/8          0     17           False           0.0
6   2018/11/28          5     18           False           0.0
7   2018/11/20         15     17            True           1.0
8   2018/11/19         15     34            True           2.0
9   2018/11/11          5     16           False           2.0
10   2018/11/6          5      0           False           2.0
11   2018/11/1         15      0            True           3.0
12  2018/10/28          0     18           False           2.0
13  2018/10/11         10     18           False           1.0
14   2018/9/25          2     51           False           1.0
15   2018/9/24         10     18           False           1.0

In [47]: df['HSHIC'] = df['High Inv Cnt'][df.Sales > 20]                         

In [48]: df                                                                      
Out[48]: 
          Date  Inventory  Sales  High Inventory  High Inv Cnt  HSHIC
0   2018/12/29          5      0           False           NaN    NaN
1   2018/12/26          5     36           False           NaN    NaN
2   2018/12/24          5     18           False           NaN    NaN
3   2018/12/15          6      0           False           NaN    NaN
4   2018/12/11          5      0           False           0.0    NaN
5    2018/12/8          0     17           False           0.0    NaN
6   2018/11/28          5     18           False           0.0    NaN
7   2018/11/20         15     17            True           1.0    NaN
8   2018/11/19         15     34            True           2.0    2.0
9   2018/11/11          5     16           False           2.0    NaN
10   2018/11/6          5      0           False           2.0    NaN
11   2018/11/1         15      0            True           3.0    NaN
12  2018/10/28          0     18           False           2.0    NaN
13  2018/10/11         10     18           False           1.0    NaN
14   2018/9/25          2     51           False           1.0    1.0
15   2018/9/24         10     18           False           1.0    NaN

In [49]:    

答案 2 :(得分:0)

问题的第一篇文章中出现错误(现在页面上的内容是正确的),所以让我提出一个使用Python 2的可行解决方案。

感谢@manwithfewneeds和@kantal。

for idx in df.index[df.Sales > 20]:
    inv = df.loc[idx + 1 : idx + 5, 'Inventory'].ge(10)   # downwards 5 rows, Inventory > 10
    date, _, sales = df.loc[idx]
    if len(inv) >= 5:
        print '%s. has Sales %s. when %s. times' % (date, sales, inv.sum())