我是Python的新手。我使用的是熊猫,下面的数据带有3个字段Task
,Status_From
和Status_To
作为数据框。
如果第一行的Status_To
与下一行的Status_From
相同,则应基于Task
合并这两行。
+------+-------------+-----------+
| Task | Status_From | Status_To |
+------+-------------+-----------+
| AAA | 31-Aug-18 | 04-Sep-18 |
| BBB | 21-Jun-18 | 21-Jun-18 |
| BBB | 21-Jun-18 | 29-Jun-18 |
| BBB | 29-Jun-18 | 29-Jun-18 |
| CCC | 20-Aug-18 | 20-Aug-18 |
| CCC | 24-Aug-18 | 24-Aug-18 |
| CCC | 24-Aug-18 | 01-Sep-18 |
| DDD | 06-Jul-18 | 06-Jul-18 |
| EEE | 18-May-18 | 18-May-18 |
| FFF | 01-Aug-18 | 01-Aug-18 |
| GGG | 20-Apr-18 | 23-Apr-18 |
| GGG | 23-Apr-18 | 23-Apr-18 |
| HHH | 22-Jan-18 | 23-Jan-18 |
| HHH | 23-Jan-18 | 23-Jan-18 |
| HHH | 23-Jan-18 | 30-Jan-18 |
+------+-------------+-----------+
预期输出:
+------+-------------+-----------+
| Task | Status_From | Status_To |
+------+-------------+-----------+
| AAA | 31-Aug-18 | 04-Sep-18 |
| BBB | 21-Jun-18 | 29-Jun-18 |
| CCC | 20-Aug-18 | 20-Aug-18 |
| CCC | 24-Aug-18 | 01-Sep-18 |
| DDD | 06-Jul-18 | 06-Jul-18 |
| EEE | 18-May-18 | 18-May-18 |
| FFF | 01-Aug-18 | 01-Aug-18 |
| GGG | 20-Apr-18 | 23-Apr-18 |
| HHH | 22-Jan-18 | 30-Jan-18 |
+------+-------------+-----------+
尝试了“ for”循环和“ if”条件。但这没有用。 有一个简单的选项可以做到这一点吗?
答案 0 :(得分:2)
假设您的数据已经排序,则可以使用cumsum()设置组,找到每个组的最后/*----------------------------------------------------------------------------
Read character from Serial Port (blocking read)
*----------------------------------------------------------------------------*/
int SER_GetChar (void) {
while (!(UART0->LSR & 0x01));
return (UART0->RBR);
}
,然后选择drop_duplicates()。
Status_To
df1的输出是:
df1 = df.assign(
g=df.groupby('Task').apply(lambda x: (x.Status_From != x.Status_To.shift()).cumsum()).reset_index(level=0, drop=True)
)
然后,使用transform:
# Task Status_From Status_To g
#0 AAA 31-Aug-18 04-Sep-18 1
#1 BBB 21-Jun-18 21-Jun-18 1
#2 BBB 21-Jun-18 29-Jun-18 1
#3 BBB 29-Jun-18 29-Jun-18 1
#4 CCC 20-Aug-18 20-Aug-18 1
#5 CCC 24-Aug-18 24-Aug-18 2
#6 CCC 24-Aug-18 01-Sep-18 2
#7 DDD 06-Jul-18 06-Jul-18 1
#8 EEE 18-May-18 18-May-18 1
#9 FFF 01-Aug-18 01-Aug-18 1
#10 GGG 20-Apr-18 23-Apr-18 1
#11 GGG 23-Apr-18 23-Apr-18 1
#12 HHH 22-Jan-18 23-Jan-18 1
#13 HHH 23-Jan-18 23-Jan-18 1
#14 HHH 23-Jan-18 30-Jan-18 1
新的输出将是:
df1['Status_To'] = df1.groupby(['Task', 'g']).Status_To.transform('last')
df1 = df1.drop_duplicates(['Task','g']).drop('g', axis=1)