Question

我有一个类似的数据，并且想创建一个名为“月”的列

+---------+------------------+------+------+
| Name    | Task             | Team | Date |
+---------+------------------+------+------+
| John    | Market study     | A    | 1    |
+---------+------------------+------+------+
| Michael | Customer service | B    | 1    |
+---------+------------------+------+------+
| Joanna  | Accounting       | C    | 1    |
+---------+------------------+------+------+
| John    | Accounting       | B    | 2    |
+---------+------------------+------+------+
| Michael | Customer service | A    | 2    |
+---------+------------------+------+------+
| Joanna  | Market study     | C    | 2    |
+---------+------------------+------+------+
| John    | Customer service | C    | 1    |
+---------+------------------+------+------+
| Michael | Market study     | A    | 1    |
+---------+------------------+------+------+
| Joanna  | Customer service | B    | 1    |
+---------+------------------+------+------+
| John    | Market study     | A    | 2    |
+---------+------------------+------+------+
| Michael | Customer service | B    | 2    |
+---------+------------------+------+------+
| Joanna  | Accounting       | C    | 2    |
+---------+------------------+------+------+

因此，基本上，我有日期信息，但是日期不包含它所属的月份。但是，我知道如果它是第一次发生，那么它将属于第1个月，如果它是第二次发生，那么它将属于第2个月。因此，例如，日期1发生了3次，然后被日期中断2.因此前3个时间属于第1个月，接下来的3个时间属于第2个月。所以我想要这样的结果：

+---------+------------------+------+------+---------+
| Name    | Task             | Team | Date | Month   |
+---------+------------------+------+------+---------+
| John    | Market study     | A    | 1    | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B    | 1    | Month 1 |
+---------+------------------+------+------+---------+
| Joanna  | Accounting       | C    | 1    | Month 1 |
+---------+------------------+------+------+---------+
| John    | Accounting       | B    | 2    | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | A    | 2    | Month 1 |
+---------+------------------+------+------+---------+
| Joanna  | Market study     | C    | 2    | Month 1 |
+---------+------------------+------+------+---------+
| John    | Customer service | C    | 1    | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Market study     | A    | 1    | Month 2 |
+---------+------------------+------+------+---------+
| Joanna  | Customer service | B    | 1    | Month 2 |
+---------+------------------+------+------+---------+
| John    | Market study     | A    | 2    | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B    | 2    | Month 2 |
+---------+------------------+------+------+---------+
| Joanna  | Accounting       | C    | 2    | Month 2 |
+---------+------------------+------+------+---------+

除了使用一些循环外，我没有其他想法。谢谢大家。

Answer 1

如果我正确理解了这个问题，则可以执行以下操作：创建掩码s，将每个保守的值分成单独的组。在s中，为每个组的每个值创建掩码s1。对s1和Date进行分组，并进行cumcount和map来创建所需的输出：

s = df.Date.ne(df.Date.shift()).cumsum()
s1 = df.Date.groupby(s).cumcount()

df['Month'] = df.groupby([s1, 'Date']).Name.cumcount().add(1).map(lambda x: 'Month '+str(x))

Out[897]:
       Name              Task Team  Date    Month
0      John      Market-study    A     1  Month 1
1   Michael  Customer-service    B     1  Month 1
2    Joanna        Accounting    C     1  Month 1
3      John        Accounting    B     2  Month 1
4   Michael  Customer-service    A     2  Month 1
5    Joanna      Market-study    C     2  Month 1
6      John  Customer-service    C     1  Month 2
7   Michael      Market-study    A     1  Month 2
8    Joanna  Customer-service    B     1  Month 2
9      John      Market-study    A     2  Month 2
10  Michael  Customer-service    B     2  Month 2
11   Joanna        Accounting    C     2  Month 2

从日期列中为月份创建列（但是日期列不包含月份信息）

1 个答案: