组内的Python Pandas自下而上计算

时间:2018-09-12 00:43:20

标签: pandas pandas-groupby

假设我有以下数据集:

df = pd.DataFrame({"A":range(1,10), "B":range(5,14), "Group": 
[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], 
"Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22]})

对于每组的最后一次观察,我有以下代码:

def calculate(df):
if (df.last == 1):
    value = df.loc["A"] + df.loc["B"]
else:

对于每组所有其他观察,行值计算如下:

value = (df.loc[i-1, "C"] + 3 * df.loc[i, "A"] + 1.65 * df.loc[i, "B"])
    return value

这就是我要尝试的简单英语。对于每个组中的最后一个观察,列C等于列A and B

的总和

对于所有其他观察,我想使用上面的“ else语句”(从此处开始,用于概念证明)从“自下而上”计算C列的值。

为进一步阐明,这些是使用excel计算Want的{​​{1}}列的公式:Group 2它具有某种“递归”性质,这就是为什么我认为for循环

我非常感谢与第一个F4="F5+(3*A4)+(1.65*B4)", F5="F6+(3*A5)+(1.65*B5)", F6="F7+(3*A6)+(1.65*B6)", F7="A7+B7".保持一致的解决方案

if statement.

而不是函数返回数据帧或类似的东西,以便我可以使用以下代码来调用函数

value = something

感谢您的帮助。谢谢

3 个答案:

答案 0 :(得分:1)

似乎您需要np.whereshift

np.where(df.last==1,df.A+df.B,df.A+1.65*df.B+df.C.shift())
Out[199]: array([  nan, 11.9 , 14.55, 27.2 , 19.85, 22.5 , 41.15, 27.8 , 30.45])

答案 1 :(得分:0)

您可以使用df.assignnp.wheredf.rolling来达到目标​​。

function search(region) {
    $.ajax({
        url: 'example.com',
        method: 'GET',
        success: successCallback,
    });

    function successCallback(data) {
        console.log(data, region);
    }
}
search('LA');

输出:

import pandas as pd
import numpy as np
df = pd.DataFrame({"A":range(1,10), "B":range(5,14), "Group": 
[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], \
               "last":[0,1,0,0,0,1,0,0,1],
               "Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22]}).sort_index(ascending = False)
df = df.assign(FakeC = df['A'] + df['B']) #you can comment out this line then replace FakeC with A+B for the following expressions
df = df.assign(value = np.where(df['last'] == 1, df['A'] + df['B'], df['FakeC'].rolling(2, min_periods=1).sum() - df['FakeC'] + 3 * df['A'] + 1.65 * df['B']))
df = df.assign(final = np.where(df['last'] == 1, df['value'], df['value'].rolling(2, min_periods=1).sum() -  df['FakeC'].rolling(2, min_periods=1).sum() + df['FakeC']))
print(df)

答案 2 :(得分:0)

这是我的尝试。我尽量保持直截了当。

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "A":range(1,10), 
    "B":range(5,14), 
    "Group": [1,1,2,2,2,2,3,3,3],
    "C":[0,0,10,0,0,16,0,0,22], 
    "want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22],
    "last":[0,1,0,0,0,1,0,0,1]})

# Determine where the last-in-group value applies.
condition = df['Group'] != df['Group'].shift(-1)

# Calculate the alternative data.
alternate_formula = (3 * df['A'] + 1.65 * df['B'])

# Calculate C as either the 'last-in-group' function or the alternative function.
df['C'] = np.where(condition, df['A'] + df['B'], alternate_formula)

# Reverse the order of the dataframe, group by the 'Group' column, run a cumulative sum
# for each group and then resort the back to the original order.
df['C'] = df.sort_index(ascending=False).groupby('Group')['C'].cumsum().sort_index()

print(df)

我认为通常使用矢量化方法比使用“应用”之类的迭代方法更为可取。