我想知道当组中的后续行符合某个条件时,我只能保留行。以下数据说明了我想要实现的目标;
数据按ID
升序和DATE
降序排序。
相同的ID只有一行或零行Purchased = 'N'
,但可以有{1}}的零行,一行或多行。
我想跟踪EMPTY状态发生变化的日期;
Purchased = 'Y'
输出:
我希望将所有行保留为ID EMPTY DATE
1 Y 03/01/2017
1 Y 02/01/2017
1 N 01/01/2017
2 Y 03/01/2017
3 N 03/01/2017
4 Y 03/01/2017
4 N 03/01/2017
4 Y 03/01/2017
4 Y 03/01/2017
:
EMPTY= 'N'
我可以使用ID EMPTY DATE
1 Y 02/01/2017
1 N 01/01/2017
2 Y 01/01/2017
3 N 03/01/2017
4 Y 03/01/2017
4 N 03/01/2017
或sql
来执行此操作;所以欢迎任何一种或两种语言的解决方案!
答案 0 :(得分:2)
如果您确实对使用R:
感兴趣library(dplyr)
df %>%
mutate(lag.empty = lead(df$EMPTY,1)) %>%
filter(lag.empty != EMPTY) %>%
select(-lag.empty)
# ID EMPTY DATE
#1 1 Y 02/01/2017
#2 1 N 01/01/2017
#3 2 Y 03/01/2017
#4 3 N 03/01/2017
#5 4 Y 03/01/2017
#6 4 N 03/01/2017
<强> 数据:的强>
df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = structure(c(2L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"),
DATE = structure(c(3L, 2L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("01/01/2017",
"02/01/2017", "03/01/2017"), class = "factor")), .Names = c("ID",
"EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L))
答案 1 :(得分:1)
dplyr
R
的一种方法
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(n()==1 |(cumsum(cumsum(EMPTY == "N"))<2 & !duplicated(EMPTY)) )
# A tibble: 6 x 3
# Groups: ID [4]
# ID EMPTY DATE
# <int> <chr> <chr>
#1 1 Y 03/01/2017
#2 1 N 01/01/2017
#3 2 Y 03/01/2017
#4 3 N 03/01/2017
#5 4 Y 03/01/2017
#6 4 N 03/01/2017
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = c("Y",
"Y", "N", "Y", "N", "Y", "N", "Y", "Y"), DATE = c("03/01/2017",
"02/01/2017", "01/01/2017", "03/01/2017", "03/01/2017", "03/01/2017",
"03/01/2017", "03/01/2017", "03/01/2017")), .Names = c("ID",
"EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L
))
答案 2 :(得分:1)
根据我的经验,这在R中是一个更漂亮的任务,但是因为你正在寻找一个python解决方案:
dict = {'id':id,'empty':empty,'date':date}
df1 = pd.DataFrame(dict)
按照您选择的方法加载到pd数据框后:
lag = list(df1.loc[1:,'empty'])
lag.append('NULL') ##to make list match frame rowcount
df1['empty_+1'] = lag
df1['check'] = df1['empty'] != df1['empty_+1']
df1.loc[(df1['check'] == True)]
答案 3 :(得分:0)
在mysql中,一种方法是
1)将自动增量row-id添加到表
ALTER TABLE table1 ADD row_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY;
2)左边加入同一张桌子,一行换挡
3)添加选择条件:(i)当前行有&#39; N&#39;空的,(ii)当前行有&#39; Y&#39;空的但是下一行有N&#39; N&#39;空
SELECT a.ID, a.Empty, a.Day
FROM table1 a
LEFT JOIN table1 b ON a.row_id + 1 = b.row_id
WHERE a.Empty = 'N' or (a.Empty = 'Y' and b.Empty = 'N')
<强> RESULT 强>
ID Empty Day
1 Y 02/01/2017
1 N 01/01/2017
2 Y 03/01/2017
3 N 03/01/2017
4 Y 03/01/2017
4 N 03/01/2017
数据强>
CREATE TABLE table1 (ID int, EMPTY varchar(255), DAY varchar(255));
INSERT table1 VALUES (1,'Y','03/01/2017'),(1,'Y','02/01/2017'),(1,'N','01/01/2017'),(2,'Y','03/01/2017'),(3,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'Y','03/01/2017');