以下是我每个月提供的基本数据。我得到了许多与部门相关的文件,工作变得非常单调和重复。
Month,year,sales,
January,2017,34400,
February,2017,35530,
March,2017,34920,
April,2017,35950,
May,2017,36230,
June,2017,36820,
July,2017,34590,
August,2017,36500,
September,2017,36600,
October,2017,37140,
November,2017,36790,
December,2017,43500,
January,2018,34900,
February,2018,37700,
March,2018,37900,
April,2018,38100,
May,2018,37800,
June,2018,38500,
July,2018,39400,
August,2018,39700,
September,2018,39980,
October,2018,40600,
November,2018,39100,
December,2018,46600,
January,2019,42500,
我尝试使用诸如value_count(遗憾的是,仅给出摘要)之类的某些功能以实现此输出。并失败了。 (请参见下面的输出。)
我需要自动填充第3列和第4列(其中fillna = True / False)
他们只允许Apache OpenOffice来完成我们的工作,因此没有Excel。但是我们得到了IT部门的许可来安装Python。
此Link中的解决方案对我没有帮助,因为它们按两列分组。我的输出中的列是相互依赖的。
import pandas as pd
df = pd.read_csv("Test_1.csv", "a")
df['comparative_position'] = df['sales'].diff().fillna=True
df.loc[df['comparative_position'] > 0.0, 'comparative_position'] = "Profit"
df.loc[df['comparative_position'] < 0.0, 'comparative_position'] = "Loss"
Month,Year,Sales,comparative_position,Months_in_P(or)L,Highest_in_12Months
January,2016,34400,NaN,NaN,NaN
February,2016,35530,Profit,1,NaN
March,2016,34920,Loss,1,NaN
April,2016,35950,Profit,1,NaN
May,2016,36230,Profit,2,NaN
June,2016,36820,Profit,3,NaN
July,2016,34590,Loss,1,NaN
August,2016,36500,Profit,1,NaN
September,2016,36600,Profit,2,NaN
October,2016,37140,Profit,3,NaN
November,2016,36790,Loss,1,NaN
December,2016,43500,Profit,1,43500
January,2017,34900,Loss,1,43500
February,2017,37700,Profit,1,43500
March,2017,37900,Profit,2,43500
April,2017,38100,Profit,3,43500
May,2017,37800,Loss,1,43500
June,2017,38500,Profit,1,43500
July,2017,39400,Profit,2,43500
August,2017,39700,Profit,3,43500
September,2017,39980,Profit,4,43500
October,2017,40600,Profit,5,43500
November,2017,39100,Loss,1,43500
December,2017,46600,Profit,1,46600
January,2018,42500,Loss,1,46600
答案 0 :(得分:2)
AFAIU应该适合您:
webpack
输出:
# Get difference from previous as True / False
df['P/L'] = df.sales > df.sales.shift()
# Add column counting 'streaks' of P or L
df['streak'] = df['P/L'].groupby(df['P/L'].ne(df['P/L'].shift()).cumsum()).cumcount()
# map True/False to string of Profit/Loss
df['P/L'] = df['P/L'].map({True:'Profit', False:'Loss'})
# max of last n months where n is 12, as in your example, you can change it to any int
df['12_max'] = df.sales.rolling(12).max()