Question

以下是我每个月提供的基本数据。我得到了许多与部门相关的文件，工作变得非常单调和重复。

Month,year,sales,  
January,2017,34400,  
February,2017,35530,  
March,2017,34920,  
April,2017,35950,  
May,2017,36230,  
June,2017,36820,  
July,2017,34590,  
August,2017,36500,  
September,2017,36600,  
October,2017,37140,  
November,2017,36790,  
December,2017,43500,  
January,2018,34900,  
February,2018,37700,  
March,2018,37900,  
April,2018,38100,  
May,2018,37800,  
June,2018,38500,  
July,2018,39400,  
August,2018,39700,  
September,2018,39980,  
October,2018,40600,  
November,2018,39100,  
December,2018,46600,  
January,2019,42500,

我尝试使用诸如value_count（遗憾的是，仅给出摘要）之类的某些功能以实现此输出。并失败了。（请参见下面的输出。）

我需要自动填充第3列和第4列（其中fillna = True / False）

第三列只是告诉我们与上个月相比是否为盈亏（例如，如果4月大于3月，则为利润）。
第四列显示了实现的P / L顺序，即连续2个月或5个月的利润。（我的意思是持续不断，因为它会导致团队获得某些奖项/认可。）
第五列是最近n个月内达到的最大销售量。

他们只允许Apache OpenOffice来完成我们的工作，因此没有Excel。但是我们得到了IT部门的许可来安装Python。

此Link中的解决方案对我没有帮助，因为它们按两列分组。我的输出中的列是相互依赖的。

import pandas as pd
df = pd.read_csv("Test_1.csv", "a")
df['comparative_position'] = df['sales'].diff().fillna=True
df.loc[df['comparative_position'] > 0.0, 'comparative_position'] = "Profit" 
df.loc[df['comparative_position'] < 0.0, 'comparative_position'] = "Loss" 

Month,Year,Sales,comparative_position,Months_in_P(or)L,Highest_in_12Months  
January,2016,34400,NaN,NaN,NaN  
February,2016,35530,Profit,1,NaN  
March,2016,34920,Loss,1,NaN  
April,2016,35950,Profit,1,NaN  
May,2016,36230,Profit,2,NaN  
June,2016,36820,Profit,3,NaN  
July,2016,34590,Loss,1,NaN  
August,2016,36500,Profit,1,NaN  
September,2016,36600,Profit,2,NaN  
October,2016,37140,Profit,3,NaN  
November,2016,36790,Loss,1,NaN  
December,2016,43500,Profit,1,43500  
January,2017,34900,Loss,1,43500  
February,2017,37700,Profit,1,43500  
March,2017,37900,Profit,2,43500  
April,2017,38100,Profit,3,43500  
May,2017,37800,Loss,1,43500  
June,2017,38500,Profit,1,43500  
July,2017,39400,Profit,2,43500  
August,2017,39700,Profit,3,43500  
September,2017,39980,Profit,4,43500  
October,2017,40600,Profit,5,43500  
November,2017,39100,Loss,1,43500  
December,2017,46600,Profit,1,46600  
January,2018,42500,Loss,1,46600

Answer 1

AFAIU应该适合您：

webpack

输出：

# Get difference from previous as True / False
df['P/L'] = df.sales > df.sales.shift()
# Add column counting 'streaks' of P or L
df['streak'] = df['P/L'].groupby(df['P/L'].ne(df['P/L'].shift()).cumsum()).cumcount()
# map True/False to string of Profit/Loss
df['P/L'] = df['P/L'].map({True:'Profit', False:'Loss'})
# max of last n months where n is 12, as in your example, you can change it to any int
df['12_max'] = df.sales.rolling(12).max()

熊猫：简单分析（比较）和Fillna

1 个答案: