熊猫根据条件合并连续的行

时间:2020-10-23 23:02:26

标签: python pandas

我的问题与此类似,但答案似乎并不完全有效!

merge rows pandas dataframe based on condition

给出以下熊猫数据框:

+---------+-----------------+-----------------+
| SECTION | TEXT            | NUMBER_OF_WORDS |
+---------+-----------------+-----------------+
| ONE     | lots   of text… | 55              |
+---------+-----------------+-----------------+
| ONE     | word1           | 1               |
+---------+-----------------+-----------------+
| ONE     | lots   of text… | 151             |
+---------+-----------------+-----------------+
| ONE     | word2           | 1               |
+---------+-----------------+-----------------+
| ONE     | word3           | 1               |
+---------+-----------------+-----------------+
| ONE     | word4           | 1               |
+---------+-----------------+-----------------+
| TWO     | lots   of text… | 523             |
+---------+-----------------+-----------------+
| TWO     | lots   of text… | 123             |
+---------+-----------------+-----------------+
| TWO     | word4           | 1               |
+---------+-----------------+-----------------+

如果NUMBER_OF_WORDS列中的值为1;它必须与以上行合并;只要它们具有相同的SECTION值。

因此最终结果应如下所示:

+---------+--------------------------------------+-----------------+
| SECTION | TEXT                                 | NUMBER_OF_WORDS |
+---------+--------------------------------------+-----------------+
| ONE     | lots   of text…, word1               | 56              |
+---------+--------------------------------------+-----------------+
| ONE     | lots   of text…, word2, word3, word4 | 154             |
+---------+--------------------------------------+-----------------+
| TWO     | lots   of text…                      | 523             |
+---------+--------------------------------------+-----------------+
| TWO     | lots   of text…, word4               | 124             |
+---------+--------------------------------------+-----------------+

这是代码;似乎可行,但并非如我所愿。

df.groupby(['SECTION', (df.NUMBER_OF_WORDS.shift(1) == 1)], as_index=False, sort=False).agg({'TEXT': lambda x: ', '.join(x), 'NUMBER_OF_WORDS': lambda x: sum(x)})

更新

这是BEN_YO的回答;但他似乎有轻微的错字。为了使该问题能够为将来的用户解答,我将略微修改他的回答。

s = df['NUMBER_OF_WORDS'].ne(1).cumsum()
out = df.groupby(s).agg({'SECTION': 'first','TEXT': lambda x: ', '.join(x),'NUMBER_OF_WORDS': lambda x: sum(x)})

1 个答案:

答案 0 :(得分:1)

让我们尝试 System.out.println(count); } public WeakReference<A> m1() { A a = new A(); return new WeakReference<>(a); } } class A { } groupby

cumsum