我有一些数据,如下:
+---------+--------+----------+------------+-------+-----+
| Comment | Reason | Location | Date | Event | Key |
+---------+--------+----------+------------+-------+-----+
| a | c | i2 | 2019-03-02 | 1 | a |
| a | b | i2 | 2019-03-02 | 1 | a |
| c | b | i2 | 2019-03-02 | 1 | a |
| c | d | i2 | 2019-03-04 | 1 | a |
| a | c | i2 | 2019-03-15 | 2 | b |
| a | b | i9 | 2019-02-22 | 2 | c |
| c | b | i9 | 2019-03-10 | 3 | d |
| c | d | i9 | 2019-03-10 | 3 | d |
| a | c | s8 | 2019-04-22 | 1 | e |
| a | b | s8 | 2019-04-25 | 1 | e |
| c | b | s8 | 2019-04-28 | 1 | e |
| c | d | t14 | 2019-05-13 | 3 | f |
+---------+--------+----------+------------+-------+-----+
现在,我实际上还没有形成Keys
列。每当Location
或Event
(或两者)更改时,都会创建一个新的Key
。我有兴趣数数。 Comment
为a
或Reason
为b
或两者都有的键。我想知道是否需要为每组条件两次应用groupby
。还是还有其他方法?
答案 0 :(得分:2)
在两列中使用shift
和cumsum
技巧:
df[['Location', 'Event']].ne(df[['Location', 'Event']].shift()).any(axis=1).cumsum()
0 1
1 1
2 1
3 1
4 2
5 3
6 4
7 4
8 5
9 5
10 5
11 6
dtype: int64
如果需要字符,请将结果映射到其等效的ASCII代码:
(df[['Location', 'Event']]
.ne(df[['Location', 'Event']].shift())
.any(axis=1)
.cumsum()
.add(96)
.map(chr))
0 a
1 a
2 a
3 a
4 b
5 c
6 d
7 d
8 e
9 e
10 e
11 f
dtype: object
在一起
cols = ['Location', 'Event']
keys = df[cols].ne(df[cols].shift()).any(1).cumsum().map(lambda x: chr(x + 96))
df['Key'] = keys
df
Comment Reason Location Date Event Key
0 a c i2 2019-03-02 1 a
1 a b i2 2019-03-02 1 a
2 c b i2 2019-03-02 1 a
3 c d i2 2019-03-04 1 a
4 a c i2 2019-03-15 2 b
5 a b i9 2019-02-22 2 c
6 c b i9 2019-03-10 3 d
7 c d i9 2019-03-10 3 d
8 a c s8 2019-04-22 1 e
9 a b s8 2019-04-25 1 e
10 c b s8 2019-04-28 1 e
11 c d t14 2019-05-13 3 f
和
df.eval('Comment == "a" or Reason == "b"').groupby(keys).sum()
a 3.0
b 1.0
c 1.0
d 1.0
e 3.0
f 0.0
dtype: float64