数据框条件列减去到零

时间:2019-04-27 08:28:59

标签: python pandas dataframe conditional

这与通常的“直到0减”问题不同,因为它取决于另一列。这个问题是关于创建条件列的。

此数据框由三列组成。

列“数量” 告诉您要添加/减去的数量。

“输入”列中的告诉您何时进行减去。

列“ cumulative_in” 告诉您您有多少钱。

+----------+----+---------------+
| quantity | in | cumulative_in |
+----------+----+---------------+
|        5 |  0 |               |
|        1 |  0 |               |
|        3 |  1 |             3 |
|        4 |  1 |             7 |
|        2 |  1 |             9 |
|        1 |  0 |               |
|        1 |  0 |               |
|        3 |  0 |               |
|        1 | -1 |               |
|        2 |  0 |               |
|        1 |  0 |               |
|        2 |  0 |               |
|        3 |  0 |               |
|        3 |  0 |               |
|        1 |  0 |               |
|        3 |  0 |               |
+----------+----+---------------+

每当列'in'等于-1时,我要从下一行开始创建列'out'(0/1),告诉它继续减去直到'cumulative_in'达到0。手动操作

列“出” 告诉您何时继续减去。

“ cumulative_subtracted”列告诉您已减去了多少。

我将“ cumulative_in”列减去“ cumulative_subtracted”,直到达到0,输出看起来像这样:

+----------+----+---------------+-----+-----------------------+
| quantity | in | cumulative_in | out | cumulative_subtracted |
+----------+----+---------------+-----+-----------------------+
|        5 |  0 |               |     |                       |
|        1 |  0 |               |     |                       |
|        3 |  1 |             3 |     |                       |
|        4 |  1 |             7 |     |                       |
|        2 |  1 |             9 |     |                       |
|        1 |  0 |               |     |                       |
|        1 |  0 |               |     |                       |
|        3 |  0 |               |     |                       |
|        1 | -1 |               |     |                       |
|        2 |  0 |             7 |   1 |                     2 |
|        1 |  0 |             6 |   1 |                     3 |
|        2 |  0 |             4 |   1 |                     5 |
|        3 |  0 |             1 |   1 |                     8 |
|        3 |  0 |             0 |   1 |                     9 |
|        1 |  0 |               |     |                       |
|        3 |  0 |               |     |                       |
+----------+----+---------------+-----+-----------------------+

2 个答案:

答案 0 :(得分:1)

我找不到向量解决方案。我希望看到一个。但是,逐行浏览时,问题并不难。我希望您的数据框不会太大!

首先设置数据。

data = {
    "quantity": [
        5,1,3,4,2,1,1,3,1,2,1,2,3,3,1,3
    ], 
    "in":[
        0,0,1,1,1,0,0,0,-1,0,0,0,0,0,0,0
    ], 
    "cumulative_in":  [
        np.NaN,np.NaN,3,7,9,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN
    ]

}

然后设置数据框和其他列。我将np.NaN用作'out',但对于'cumulative_subtracted'来说更容易使用0

df=pd.DataFrame(data)
df['out'] = np.NaN
df['cumulative_subtracted'] = 0

设置初始变量

last_in = 0.
reduce = False

不幸的是,逐行浏览数据框。

for i in df.index:
    # check if necessary to adjust last_in value.
    if ~np.isnan(df.at[i, "cumulative_in"]) and reduce == False:
        last_in = df.at[i, "cumulative_in"]
    # check if -1 and change reduce to true
    elif df.at[i, "in"] == -1:
        reduce = True
    # check if reduce true, the implement reductions
    elif reduce == True:
        df.at[i, "out"] = 1
        if df.at[i, "quantity"] <= last_in:
            last_in -= df.at[i, "quantity"]
            df.at[i, "cumulative_in"] = last_in
            df.at[i, "cumulative_subtracted"] = (
                df.at[i - 1, "cumulative_subtracted"] + df.at[i, "quantity"]
            )
        elif df.at[i, "quantity"] > last_in:
            df.at[i, "cumulative_in"] = 0
            df.at[i, "cumulative_subtracted"] = (
                df.at[i - 1, "cumulative_subtracted"] + last_in
            )
            last_in = 0
            reduce = False

这适用于给定的数据,希望适用于所有数据集。

print(df)

    quantity  in  cumulative_in  out  cumulative_subtracted
0          5   0            NaN  NaN                      0
1          1   0            NaN  NaN                      0
2          3   1            3.0  NaN                      0
3          4   1            7.0  NaN                      0
4          2   1            9.0  NaN                      0
5          1   0            NaN  NaN                      0
6          1   0            NaN  NaN                      0
7          3   0            NaN  NaN                      0
8          1  -1            NaN  NaN                      0
9          2   0            7.0  1.0                      2
10         1   0            6.0  1.0                      3
11         2   0            4.0  1.0                      5
12         3   0            1.0  1.0                      8
13         3   0            0.0  1.0                      9
14         1   0            NaN  NaN                      0
15         3   0            NaN  NaN                      0

答案 1 :(得分:0)

对于我来说,不清楚要减去的数量尚未达到零并且在“输入”列中还有另一个“ 1”时会发生什么情况。

但是,这是一个简单情况的粗略解决方案:

import pandas as pd
import numpy as np

size = 20

df = pd.DataFrame(
    {
        "quantity": np.random.randint(1, 6, size),
        "in": np.full(size, np.nan),
    }
)

# These are just to place a random 1 and -1 into 'in', not important
df.loc[np.random.choice(df.iloc[:size//3, :].index, 1), 'in'] = 1
df.loc[np.random.choice(df.iloc[size//3:size//2, :].index, 1), 'in'] = -1
df.loc[np.random.choice(df.iloc[size//2:, :].index, 1), 'in'] = 1

# Fill up with 1/-1 values the missing values after each entry up to the
# next 1/-1 entry.
df.loc[:, 'in'] = df['in'].fillna(method='ffill')

# Calculates the cumulative sum with a negative value for subtractions
df["cum_in"] = (df["quantity"] * df['in']).cumsum()

# Subtraction indicator and cumulative column
df['out'] = (df['in'] == -1).astype(int)
df["cumulative_subtracted"] = df.loc[df['in'] == -1, 'quantity'].cumsum()

# Remove values when the 'cum_in' turns to negative
df.loc[
    df["cum_in"] < 0 , ["in", "cum_in", "out", "cumulative_subtracted"]
] = np.NaN


print(df)