我有一个类似的数据,并且想创建一个名为“月”的列
+---------+------------------+------+------+
| Name | Task | Team | Date |
+---------+------------------+------+------+
| John | Market study | A | 1 |
+---------+------------------+------+------+
| Michael | Customer service | B | 1 |
+---------+------------------+------+------+
| Joanna | Accounting | C | 1 |
+---------+------------------+------+------+
| John | Accounting | B | 2 |
+---------+------------------+------+------+
| Michael | Customer service | A | 2 |
+---------+------------------+------+------+
| Joanna | Market study | C | 2 |
+---------+------------------+------+------+
| John | Customer service | C | 1 |
+---------+------------------+------+------+
| Michael | Market study | A | 1 |
+---------+------------------+------+------+
| Joanna | Customer service | B | 1 |
+---------+------------------+------+------+
| John | Market study | A | 2 |
+---------+------------------+------+------+
| Michael | Customer service | B | 2 |
+---------+------------------+------+------+
| Joanna | Accounting | C | 2 |
+---------+------------------+------+------+
因此,基本上,我有日期信息,但是日期不包含它所属的月份。但是,我知道如果它是第一次发生,那么它将属于第1个月,如果它是第二次发生,那么它将属于第2个月。因此,例如,日期1发生了3次,然后被日期中断2.因此前3个时间属于第1个月,接下来的3个时间属于第2个月。所以我想要这样的结果:
+---------+------------------+------+------+---------+
| Name | Task | Team | Date | Month |
+---------+------------------+------+------+---------+
| John | Market study | A | 1 | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B | 1 | Month 1 |
+---------+------------------+------+------+---------+
| Joanna | Accounting | C | 1 | Month 1 |
+---------+------------------+------+------+---------+
| John | Accounting | B | 2 | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | A | 2 | Month 1 |
+---------+------------------+------+------+---------+
| Joanna | Market study | C | 2 | Month 1 |
+---------+------------------+------+------+---------+
| John | Customer service | C | 1 | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Market study | A | 1 | Month 2 |
+---------+------------------+------+------+---------+
| Joanna | Customer service | B | 1 | Month 2 |
+---------+------------------+------+------+---------+
| John | Market study | A | 2 | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B | 2 | Month 2 |
+---------+------------------+------+------+---------+
| Joanna | Accounting | C | 2 | Month 2 |
+---------+------------------+------+------+---------+
除了使用一些循环外,我没有其他想法。 谢谢大家。
答案 0 :(得分:1)
如果我正确理解了这个问题,则可以执行以下操作:创建掩码s
,将每个保守的值分成单独的组。在s
中,为每个组的每个值创建掩码s1
。对s1
和Date
进行分组,并进行cumcount
和map
来创建所需的输出:
s = df.Date.ne(df.Date.shift()).cumsum()
s1 = df.Date.groupby(s).cumcount()
df['Month'] = df.groupby([s1, 'Date']).Name.cumcount().add(1).map(lambda x: 'Month '+str(x))
Out[897]:
Name Task Team Date Month
0 John Market-study A 1 Month 1
1 Michael Customer-service B 1 Month 1
2 Joanna Accounting C 1 Month 1
3 John Accounting B 2 Month 1
4 Michael Customer-service A 2 Month 1
5 Joanna Market-study C 2 Month 1
6 John Customer-service C 1 Month 2
7 Michael Market-study A 1 Month 2
8 Joanna Customer-service B 1 Month 2
9 John Market-study A 2 Month 2
10 Michael Customer-service B 2 Month 2
11 Joanna Accounting C 2 Month 2