在后续行符合条件后,只保留右行

时间:2017-07-11 18:21:20

标签: python mysql sql r group-by

我想知道当组中的后续行符合某个条件时,我只能保留行。以下数据说明了我想要实现的目标;

数据按ID升序和DATE降序排序。

相同的ID只有一行或零行Purchased = 'N',但可以有{1}}的零行,一行或多行。

我想跟踪EMPTY状态发生变化的日期;

Purchased = 'Y'

输出:

我希望将所有行保留为ID EMPTY DATE 1 Y 03/01/2017 1 Y 02/01/2017 1 N 01/01/2017 2 Y 03/01/2017 3 N 03/01/2017 4 Y 03/01/2017 4 N 03/01/2017 4 Y 03/01/2017 4 Y 03/01/2017

EMPTY= 'N'

我可以使用ID EMPTY DATE 1 Y 02/01/2017 1 N 01/01/2017 2 Y 01/01/2017 3 N 03/01/2017 4 Y 03/01/2017 4 N 03/01/2017 sql来执行此操作;所以欢迎任何一种或两种语言的解决方案!

4 个答案:

答案 0 :(得分:2)

如果您确实对使用R:

感兴趣
library(dplyr)
df %>%
      mutate(lag.empty = lead(df$EMPTY,1)) %>%
      filter(lag.empty != EMPTY)  %>%
      select(-lag.empty)


#  ID EMPTY       DATE
#1  1     Y 02/01/2017
#2  1     N 01/01/2017
#3  2     Y 03/01/2017
#4  3     N 03/01/2017
#5  4     Y 03/01/2017
#6  4     N 03/01/2017

<强> 数据:

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = structure(c(2L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), 
DATE = structure(c(3L, 2L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("01/01/2017", 
"02/01/2017", "03/01/2017"), class = "factor")), .Names = c("ID", 
"EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L))

答案 1 :(得分:1)

dplyr

R的一种方法
library(dplyr)
df1 %>% 
  group_by(ID) %>%  
  filter(n()==1 |(cumsum(cumsum(EMPTY == "N"))<2 & !duplicated(EMPTY)) )
# A tibble: 6 x 3
# Groups:   ID [4]
#     ID EMPTY       DATE
#  <int> <chr>      <chr>
#1     1     Y 03/01/2017
#2     1     N 01/01/2017
#3     2     Y 03/01/2017
#4     3     N 03/01/2017
#5     4     Y 03/01/2017
#6     4     N 03/01/2017

数据

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = c("Y", 
 "Y", "N", "Y", "N", "Y", "N", "Y", "Y"), DATE = c("03/01/2017", 
"02/01/2017", "01/01/2017", "03/01/2017", "03/01/2017", "03/01/2017", 
"03/01/2017", "03/01/2017", "03/01/2017")), .Names = c("ID", 
 "EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L
 ))

答案 2 :(得分:1)

根据我的经验,这在R中是一个更漂亮的任务,但是因为你正在寻找一个python解决方案:

dict = {'id':id,'empty':empty,'date':date}
df1 = pd.DataFrame(dict)

按照您选择的方法加载到pd数据框后:

lag = list(df1.loc[1:,'empty'])
lag.append('NULL')                    ##to make list match frame rowcount
df1['empty_+1'] = lag
df1['check'] = df1['empty'] != df1['empty_+1']
df1.loc[(df1['check'] == True)]

答案 3 :(得分:0)

在mysql中,一种方法是

1)将自动增量row-id添加到表

 ALTER TABLE table1 ADD row_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY;

2)左边加入同一张桌子,一行换挡

3)添加选择条件:(i)当前行有&#39; N&#39;空的,(ii)当前行有&#39; Y&#39;空的但是下一行有N&#39; N&#39;空

SELECT a.ID, a.Empty, a.Day 
FROM table1 a 
LEFT JOIN table1 b ON a.row_id + 1 = b.row_id
WHERE a.Empty = 'N' or (a.Empty = 'Y' and b.Empty = 'N')

<强> RESULT

ID  Empty   Day
1   Y   02/01/2017
1   N   01/01/2017
2   Y   03/01/2017
3   N   03/01/2017
4   Y   03/01/2017
4   N   03/01/2017

数据

CREATE TABLE table1 (ID int, EMPTY varchar(255), DAY varchar(255));
INSERT table1 VALUES (1,'Y','03/01/2017'),(1,'Y','02/01/2017'),(1,'N','01/01/2017'),(2,'Y','03/01/2017'),(3,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'Y','03/01/2017');