我有以下数据框,首先为每个队列计算以下数学运算年+ n /年。值== 2009然后执行每个队列的均值
df
id
year 2009 2010 2011 2012 2013 2014 2015
cohort
2009.0 72092.0 60513.0 48797.0 40968.0 34919.0 30452.0 26961.0
2010.0 NaN 73735.0 61899.0 50263.0 42184.0 36150.0 31516.0
2011.0 NaN NaN 76809.0 64093.0 51372.0 43277.0 36994.0
2012.0 NaN NaN NaN 69776.0 57621.0 46453.0 39098.0
2013.0 NaN NaN NaN NaN 71613.0 58996.0 47657.0
2014.0 NaN NaN NaN NaN NaN 65430.0 52540.0
2015.0 NaN NaN NaN NaN NaN NaN 67121.0
2016.0 NaN NaN NaN NaN NaN NaN NaN
2017.0 NaN NaN NaN NaN NaN NaN NaN
我将展示我想要执行的数学运算,因为我的英语不好而且数学是一种通用语言:)
自2009年起每1年过去一次:(n = 1)
需要的第一个值=((60513.0 / 72092.0)+(61899.0 / 73735.0)+(64093.0 + 76809.0)+(57621.0 / 69776.0)+(58996.0 + 71613.0)+(52540.0 / 65430.0))/ 6
自2009年起每2年通过一次:(n = 2)
所需的第二个值=((48797.0 / 72092.0)+(50263.0 / 73735.0)+(51372.0 / 76809.0)+(46453.0 / 69776.0)+(47657.0 / 71613.0))/ 5
自从2009年以来每3年通过一次:(n = 3)(最后一个,我认为这个我想做的循环将会理解)
需要的第三个值=((40968.0 / 72092.0)+(42184.0 / 73735.0)+(43277.0 / 76809.0) +(39098.0 / 69776.0))/ 4
依此类推,直到最后一个值为
最后一个值= 26961.0 / 72092.0
提前致谢并抱歉我的英文
我正在尝试这样的事情,也许它可以提供帮助
第一个值:
((df1.iloc[0,1]/df1.iloc[0,0]) + (df1.iloc[1,2]/df1.iloc[1,1]) +
(df1.iloc[2,3]/df1.iloc[2,2]) + (df1.iloc[3,4]/df1.iloc[3,3]) +
(df1.iloc[4,5]/df1.iloc[4,4]) + (df1.iloc[5,6]/df1.iloc[5,5]))/6
第二个值:
((df1.iloc[0,2]/df1.iloc[0,0]) + (df1.iloc[1,3]/df1.iloc[1,1]) +
(df1.iloc[2,4]/df1.iloc[2,2]) + (df1.iloc[3,5]/df1.iloc[3,3]) +
(df1.iloc[4,6]/df1.iloc[4,4]))/5
第三个价值:
((df1.iloc[0,3]/df1.iloc[0,0]) + (df1.iloc[1,4]/df1.iloc[1,1]) +
(df1.iloc[2,5]/df1.iloc[2,2]) + (df1.iloc[3,6]/df1.iloc[3,3]))/4
这样的东西,但有一个循环
答案 0 :(得分:0)
这个怎么样?
----更新----
import numpy as np
def sum_with_shift(df, n):
row_values = []
for i, row in df.iterrows():
if (i + n - 1) < df.columns.max():
row_values += [row[i] / row[i + n]]
if row_values:
return np.mean(row_values)
else:
return 0
传递您的df
和n=1
:
sum_with_shift(df, 1)
72092.0 / 60513.0
73735.0 / 61899.0
76809.0 / 64093.0
69776.0 / 57621.0
71613.0 / 58996.0
65430.0 / 52540.0
130852.83333333333
传递您的df
和n=2
:
sum_with_shift(df, 2)
72092.0 / 48797.0
73735.0 / 50263.0
76809.0 / 51372.0
69776.0 / 46453.0
71613.0 / 47657.0
121713.39999999999
----更新----
要获得重现性,请尝试运行以下代码以生成df
。
df_as_json = '{"2009":{"2009":72092.0,"2010":null,"2011":null,"2012":null,"2013":null,"2014":null,"2015":null,"2016":null,"2017":null},"2010":{"2009":60513.0,"2010":73735.0,"2011":null,"2012":null,"2013":null,"2014":null,"2015":null,"2016":null,"2017":null},"2011":{"2009":48797.0,"2010":61899.0,"2011":76809.0,"2012":null,"2013":null,"2014":null,"2015":null,"2016":null,"2017":null},"2012":{"2009":40968.0,"2010":50263.0,"2011":64093.0,"2012":69776.0,"2013":null,"2014":null,"2015":null,"2016":null,"2017":null},"2013":{"2009":34919.0,"2010":42184.0,"2011":51372.0,"2012":57621.0,"2013":71613.0,"2014":null,"2015":null,"2016":null,"2017":null},"2014":{"2009":30452.0,"2010":36150.0,"2011":43277.0,"2012":46453.0,"2013":58996.0,"2014":65430.0,"2015":null,"2016":null,"2017":null},"2015":{"2009":26961.0,"2010":31516.0,"2011":36994.0,"2012":39098.0,"2013":47657.0,"2014":52540.0,"2015":67121.0,"2016":null,"2017":null}}'
df = pd.read_json(df_as_json)
答案 1 :(得分:0)
所以看起来你正试图从表格的第一年(列)到最后一年(列)进行迭代。然后,在你的数学中,除了你当前迭代到去年的那一年,你基本上都在做同样的事情。看起来你需要一个循环
numcols = 6 # Set this to the correct value
for year in range(0, numcols-1):
count = numcols - year
sum = 0
for x in range(year, numcols-1):
sum += df1.iloc[x-year,1+x]/df1.iloc[x-year,x-year]
print("Answer for this year is: {}".format(sum/count))