熊猫:简单分析(比较)和Fillna

时间:2019-02-17 11:28:10

标签: python pandas dataframe comparison

以下是我每个月提供的基本数据。我得到了许多与部门相关的文件,工作变得非常单调和重复。

Month,year,sales,  
January,2017,34400,  
February,2017,35530,  
March,2017,34920,  
April,2017,35950,  
May,2017,36230,  
June,2017,36820,  
July,2017,34590,  
August,2017,36500,  
September,2017,36600,  
October,2017,37140,  
November,2017,36790,  
December,2017,43500,  
January,2018,34900,  
February,2018,37700,  
March,2018,37900,  
April,2018,38100,  
May,2018,37800,  
June,2018,38500,  
July,2018,39400,  
August,2018,39700,  
September,2018,39980,  
October,2018,40600,  
November,2018,39100,  
December,2018,46600,  
January,2019,42500,  

我尝试使用诸如value_count(遗憾的是,仅给出摘要)之类的某些功能以实现此输出。并失败了。 (请参见下面的输出。)

我需要自动填充第3列和第4列(其中fillna = True / False)

  • 第三列只是告诉我们与上个月相比是否为盈亏(例如,如果4月大于3月,则为利润)。
  • 第四列显示了实现的P / L顺序,即连续2个月或5个月的利润。 (我的意思是持续不断,因为它会导致团队获得某些奖项/认可。)
  • 第五列是最近n个月内达到的最大销售量。

他们只允许Apache OpenOffice来完成我们的工作,因此没有Excel。但是我们得到了IT部门的许可来安装Python。

Link中的解决方案对我没有帮助,因为它们按两列分组。我的输出中的列是相互依赖的。

import pandas as pd
df = pd.read_csv("Test_1.csv", "a")
df['comparative_position'] = df['sales'].diff().fillna=True
df.loc[df['comparative_position'] > 0.0, 'comparative_position'] = "Profit" 
df.loc[df['comparative_position'] < 0.0, 'comparative_position'] = "Loss" 

Month,Year,Sales,comparative_position,Months_in_P(or)L,Highest_in_12Months  
January,2016,34400,NaN,NaN,NaN  
February,2016,35530,Profit,1,NaN  
March,2016,34920,Loss,1,NaN  
April,2016,35950,Profit,1,NaN  
May,2016,36230,Profit,2,NaN  
June,2016,36820,Profit,3,NaN  
July,2016,34590,Loss,1,NaN  
August,2016,36500,Profit,1,NaN  
September,2016,36600,Profit,2,NaN  
October,2016,37140,Profit,3,NaN  
November,2016,36790,Loss,1,NaN  
December,2016,43500,Profit,1,43500  
January,2017,34900,Loss,1,43500  
February,2017,37700,Profit,1,43500  
March,2017,37900,Profit,2,43500  
April,2017,38100,Profit,3,43500  
May,2017,37800,Loss,1,43500  
June,2017,38500,Profit,1,43500  
July,2017,39400,Profit,2,43500  
August,2017,39700,Profit,3,43500  
September,2017,39980,Profit,4,43500  
October,2017,40600,Profit,5,43500  
November,2017,39100,Loss,1,43500  
December,2017,46600,Profit,1,46600  
January,2018,42500,Loss,1,46600 

1 个答案:

答案 0 :(得分:2)

AFAIU应该适合您:

webpack

输出:

# Get difference from previous as True / False
df['P/L'] = df.sales > df.sales.shift()
# Add column counting 'streaks' of P or L
df['streak'] = df['P/L'].groupby(df['P/L'].ne(df['P/L'].shift()).cumsum()).cumcount()
# map True/False to string of Profit/Loss
df['P/L'] = df['P/L'].map({True:'Profit', False:'Loss'})
# max of last n months where n is 12, as in your example, you can change it to any int
df['12_max'] = df.sales.rolling(12).max()