Question

我有一个数据框df，它具有日期，组和间隔天列。我想为一个组选择所有间隔天，这些间隔天从最新日期（最大日期）开始连续为1。如果间隔天数不等于1，那么我们将忽略行，直到间隔天数不等于1为止。为了重现目的，我创建了当前df和预期的df ...

sqoop import \
  --query 'SELECT id, name, place, NULL AS contact_number FROM mysql_table'
  --connect jdbc:mysql://mysql.example.com/sqoop \
  --Any other options

Answer 1

我的第一个评论和现在起作用的唯一区别是对问题的分组介绍。

基本R：

public class TestJEditorPane {
public static void main(String[] args) {
    JFrame frame = new JFrame();

    JEditorPane pane = new JEditorPane();
    pane.setContentType("text/html");
    pane.setText("<html><b>Hello World</b></html>");

    frame.add(pane);
    frame.setSize(200, 200);
    frame.setVisible(true);
}
}

Tidyverse：

do.call("rbind", by(df, df$Group, FUN=function(d) d[rev(cumall(rev(d$Gap_Days == 1))),]))
#            Date Group Gap_Days
# a.1  2018-10-15     a        1
# a.2  2018-10-16     a        1
# a.3  2018-10-17     a        1
# b.7  2018-10-18     b        1
# b.8  2018-10-19     b        1
# c.13 2018-10-27     c        1
# c.14 2018-10-28     c        1

Answer 2

这是tidyverse

的一种方法

library(dplyr)
library(data.table)
df %>% 
   group_by(grp = rleid(Gap_Days), 
   ind = any(Date == max(.data$Date))) %>% 
   ungroup %>% 
   filter(grp == max(grp) & ind) %>% 
   select(-ind, -grp)
# A tibble: 3 x 2
#   Date       Gap_Days
#  <date>        <dbl>
#1 2018-10-19        1
#2 2018-10-20        1
#3 2018-10-21        1

如果已经订购了“日期”列，那么我们只需要检查“间隔天数”中的1s

i1 <- inverse.rle(within.list(rle(df$Gap_Days == 1), 
           values[lengths < max(lengths) & values] <- FALSE))
df[i1,, drop = FALSE]

如何根据条件选择R数据帧中的连续行？

2 个答案: