python pandas条件累积和

时间:2017-01-02 02:44:14

标签: python-3.x pandas dataframe ipython

考虑我的数据框df

data  data_binary  sum_data
  2       1            1
  5       0            0
  1       1            1
  4       1            2
  3       1            3
  10      0            0
  7       0            0
  3       1            1

我想计算连续data_binary值组中1的累计和。

第一组1只有一个1sum_data只有一个1。但是,第二组11个,sum_data[1, 2, 3]

我已尝试使用np.where(df['data_binary'] == 1, df['data_binary'].cumsum(), 0)但返回

array([1, 0, 2, 3, 4, 0, 0, 5])

这不是我想要的。

3 个答案:

答案 0 :(得分:12)

您希望获取data_binary的累积总和,并减去data_binary为零的最新累计和。

b = df.data_binary
c = b.cumsum()
c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int)

0    1
1    0
2    1
3    2
4    3
5    0
6    0
7    1
Name: data_binary, dtype: int64

解释

让我们从并排的每个步骤开始

cols = ['data_binary', 'cumulative_sum', 'nan_non_zero', 'forward_fill', 'final_result']
print(pd.concat([
        b, c,
        c.mask(b != 0),
        c.mask(b != 0).ffill(),
        c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int)
    ], axis=1, keys=cols))


   data_binary  cumulative_sum  nan_non_zero  forward_fill  final_result
0            1               1           NaN           NaN             1
1            0               1           1.0           1.0             0
2            1               2           NaN           1.0             1
3            1               3           NaN           1.0             2
4            1               4           NaN           1.0             3
5            0               4           4.0           4.0             0
6            0               4           4.0           4.0             0
7            1               5           NaN           4.0             1

cumulative_sum的问题是data_binary为零的行,不会重置总和。这就是这个解决方案的动力。我们如何重置" data_binary为零时的总和?简单!我将data_binary为零的累积和切片并向前填充值。当我得出这个和累积总和之间的差异时,我有效地重置了总和。

答案 1 :(得分:7)

我认为你可以groupby DataFrameGroupBy.cumsum Series !=0首先将shift ed列比较,如果不相等(data_binary),然后按cumsum创建群组。最后使用mask列<{1}}替换print (df.data_binary.ne(df.data_binary.shift()).cumsum()) 0 1 1 2 2 3 3 3 4 3 5 4 6 4 7 5 Name: data_binary, dtype: int32 df['sum_data1'] = df.data_binary.groupby(df.data_binary.ne(df.data_binary.shift()).cumsum()) .cumsum() df['sum_data1'] = df['sum_data1'].mask(df.data_binary == 0, 0) print (df) data data_binary sum_data sum_data1 0 2 1 1 1 1 5 0 0 0 2 1 1 1 1 3 4 1 2 2 4 3 1 3 3 5 10 0 0 0 6 7 0 0 0 7 3 1 1 1

#include <Poco/Net/POP3ClientSession.h>
#include <Poco/Net/MailMessage.h>
#include <iostream>
#include <string>
using namespace std;
using namespace Poco::Net;


#include <iconv.h>

const size_t BUF_SIZE=1024;


class IConv {
    iconv_t ic_;
public:
    IConv(const char* to, const char* from)
        : ic_(iconv_open(to,from))    { }
    ~IConv() { iconv_close(ic_); }

     bool convert(char* input, char* output, size_t& out_size) {
        size_t inbufsize = strlen(input)+1;
        return iconv(ic_, &input, &inbufsize, &output, &out_size);
     }
};


int main()
{
    POP3ClientSession session("poczta.o2.pl");
    session.login("my mail", "my password");

    POP3ClientSession::MessageInfoVec messages;
    session.listMessages(messages);
    cout << "id: " << messages[0].id << " size: " << messages[0].size << endl;

    MailMessage message;
    session.retrieveMessage(messages[0].id, message);
    const string subject = message.getSubject();


    cout << "Original subject: " << subject << endl;

    IConv iconv_("UTF8","ISO-8859-2");


    char from[BUF_SIZE];// "=?ISO-8859-2?Q?Re: M=F3j sen o JP II?=";
    subject.copy(from, sizeof(from));
    char to[BUF_SIZE] = "bye";
    size_t outsize = BUF_SIZE;//you will need it

    iconv_.convert(from, to, outsize);
    cout << "converted: " << to << endl;
}

答案 2 :(得分:0)

如果只用一个命令就想要出色的piRSquared's answer

df['sum_data'] = df[['data_binary']].apply(
    lambda x: x.cumsum().sub(x.cumsum().mask(x != 0).ffill(), fill_value=0).astype(int), 
    axis=0)

请注意,为了将applyaxis参数一起使用,必须使用右侧的双方括号来构成一个单列的DataFrame而不是Series。 apply用于系列)。