子集符合字符串条件的连续行对

时间:2018-04-18 07:31:25

标签: r

我有一个包含3列和数百行的数据框。特定列包含三个字符串之一:“打开”,“关闭”,“取消”

    type    unique_id   group
1   Open    11468329881 g_2
2   Close   11468329881 g_2
3   Open    23254429881 g_3
4   Cancel  23254429881 g_3
5   Open    32550829881 g_4
6   Close   32550829881 g_4
7   Open    43254429881 g_5
8   Close   43254429881 g_5
9   Open    52627629881 g_6
10  Close   52627629881 g_6
11  Open    62747029881 g_7
12  Close   62747029881 g_7
13  Open    2499619881  g_8
14  Close   2499619881  g_8
15  Open    32975019881 g_9
16  Close   32975019881 g_9
17  Open    42975119881 g_10
18  Cancel  42975119881 g_10
19  Open    53560019881 g_11
20  Open    53560019881 g_11
21  Open    62521619881 g_12
22  Close   62521619881 g_12
23  Open    72663719881 g_13
24  Close   72663719881 g_13
25  Open    82663819881 g_14
26  Close   82663819881 g_14
27  Open    92747019881 g_15
28  Open    92747019881 g_15
29  Open    1499629881  g_15
30  Close   1499629881  g_15

我想循环遍历每个组(例如:g_1g_2)并对行进行子集化,如果订单是“打开”,“关闭”或“打开”,“取消”任何其他顺序应该被忽略。

例如g_2应该是子集

    type    unique_id   group
1   Open    11468329881 g_2
2   Close   11468329881 g_2

g_11应该被忽略,因为订单是“Open”“Open”

g_15应该是子集

    type    unique_id   group
29  Open    1499629881  g_15
30  Close   1499629881  g_15

任何帮助都将不胜感激。

编辑:如果我之前不清楚,我道歉。对于下面给出的样本,建议的解决方案不适用于g_8

Open    21921312463 g_1
Close   21921312463 g_1
Open    31032312463 g_2
Close   31032312463 g_2
Open    41032212463 g_3
Close   41032212463 g_3
Open    51032312463 g_4
Close   51032312463 g_4
Open    61032212463 g_5
Close   61032212463 g_5
Open    71032312463 g_6
Close   71032312463 g_6
Open    81032212463 g_7
Close   81032212463 g_7
Open    21921312463 g_8
Open    21921312463 g_8
Close   21921312463 g_8
Open    31032312463 g_9
Close   31032312463 g_9
Open    41032212463 g_10
Close   41032212463 g_10
Open    51032312463 g_11
Close   51032312463 g_11
Open    61032212463 g_12
Close   61032212463 g_12
Open    71032312463 g_13
Close   71032312463 g_13
Open    81032212463 g_14
Close   81032212463 g_14

我希望g_8被过滤以提供

Open    21921312463 g_8
Close   21921312463 g_8

并忽略组中的第一行

1 个答案:

答案 0 :(得分:4)

按“群组”进行分组后,filter行检查allvector)或c("Open", "Close")中的元素是否|c("Open", "Cancel"))存在%in%'类型'列

library(dplyr)
df1 %>% 
  group_by(group) %>% 
  #group_by(group, unique_id) %>%
  filter(all(c("Open", "Close") %in% type)| all(c("Open", "Cancel") %in% type))

如果分组变量包含“unique_id”,请使用group_by更新group_by(group, unique_id)

更新

根据更新的数据集和新逻辑,我们检查下一个值,看它是“关闭”还是“取消”

df2 %>% 
  group_by(group, unique_id) %>%
   mutate(ind = which(type == "Open" & lead(type) %in% c("Close", "Cancel"))[1]) %>% 
   filter(!is.na(ind)) %>% 
   slice(ind[1]:(ind[1]+1)) %>% 
   select(-ind)