如何根据不同的列值合并2行

时间:2019-05-08 10:47:41

标签: python pandas

我是Python的新手。我使用的是熊猫,下面的数据带有3个字段TaskStatus_FromStatus_To作为数据框。

如果第一行的Status_To与下一行的Status_From相同,则应基于Task合并这两行。

+------+-------------+-----------+
| Task | Status_From | Status_To |
+------+-------------+-----------+
| AAA  | 31-Aug-18   | 04-Sep-18 |
| BBB  | 21-Jun-18   | 21-Jun-18 |
| BBB  | 21-Jun-18   | 29-Jun-18 |
| BBB  | 29-Jun-18   | 29-Jun-18 |
| CCC  | 20-Aug-18   | 20-Aug-18 |
| CCC  | 24-Aug-18   | 24-Aug-18 |
| CCC  | 24-Aug-18   | 01-Sep-18 |
| DDD  | 06-Jul-18   | 06-Jul-18 |
| EEE  | 18-May-18   | 18-May-18 |
| FFF  | 01-Aug-18   | 01-Aug-18 |
| GGG  | 20-Apr-18   | 23-Apr-18 |
| GGG  | 23-Apr-18   | 23-Apr-18 |
| HHH  | 22-Jan-18   | 23-Jan-18 |
| HHH  | 23-Jan-18   | 23-Jan-18 |
| HHH  | 23-Jan-18   | 30-Jan-18 |
+------+-------------+-----------+

预期输出:

+------+-------------+-----------+
| Task | Status_From | Status_To |
+------+-------------+-----------+
| AAA  | 31-Aug-18   | 04-Sep-18 |
| BBB  | 21-Jun-18   | 29-Jun-18 |
| CCC  | 20-Aug-18   | 20-Aug-18 |
| CCC  | 24-Aug-18   | 01-Sep-18 |
| DDD  | 06-Jul-18   | 06-Jul-18 |
| EEE  | 18-May-18   | 18-May-18 |
| FFF  | 01-Aug-18   | 01-Aug-18 |
| GGG  | 20-Apr-18   | 23-Apr-18 |
| HHH  | 22-Jan-18   | 30-Jan-18 |
+------+-------------+-----------+

尝试了“ for”循环和“ if”条件。但这没有用。 有一个简单的选项可以做到这一点吗?

1 个答案:

答案 0 :(得分:2)

假设您的数据已经排序,则可以使用cumsum()设置组,找到每个组的最后/*---------------------------------------------------------------------------- Read character from Serial Port (blocking read) *----------------------------------------------------------------------------*/ int SER_GetChar (void) { while (!(UART0->LSR & 0x01)); return (UART0->RBR); } ,然后选择drop_duplicates()。

Status_To

df1的输出是:

df1 = df.assign(
    g=df.groupby('Task').apply(lambda x: (x.Status_From != x.Status_To.shift()).cumsum()).reset_index(level=0, drop=True)
)

然后,使用transform:

#   Task Status_From  Status_To  g
#0   AAA   31-Aug-18  04-Sep-18  1
#1   BBB   21-Jun-18  21-Jun-18  1
#2   BBB   21-Jun-18  29-Jun-18  1
#3   BBB   29-Jun-18  29-Jun-18  1
#4   CCC   20-Aug-18  20-Aug-18  1
#5   CCC   24-Aug-18  24-Aug-18  2
#6   CCC   24-Aug-18  01-Sep-18  2
#7   DDD   06-Jul-18  06-Jul-18  1
#8   EEE   18-May-18  18-May-18  1
#9   FFF   01-Aug-18  01-Aug-18  1
#10  GGG   20-Apr-18  23-Apr-18  1
#11  GGG   23-Apr-18  23-Apr-18  1
#12  HHH   22-Jan-18  23-Jan-18  1
#13  HHH   23-Jan-18  23-Jan-18  1
#14  HHH   23-Jan-18  30-Jan-18  1

新的输出将是:

df1['Status_To'] = df1.groupby(['Task', 'g']).Status_To.transform('last')
df1 = df1.drop_duplicates(['Task','g']).drop('g', axis=1)