如何在列中查找连续重复值

时间:2017-08-30 05:25:13

标签: r

您好我有一个包含“var”和“value”列的df,如果值出现>值列中的2次

,如何查找/计算按var分组的输出“列”
var = c("A","A","A","A","A","B","B","B","B","B")
value = c(22,1,1,1,1,31,21,1,1,1)
df = data.frame(var, value)

output = c("non_rep","non_rep","non_rep","rep","rep","non_rep","non_rep","non_rep","rep")

预期产出:

var value   output
A   21  non_rep
A   1   non_rep
A   1   non_rep
A   1   rep
A   1   rep
B   31  non_rep
B   21  non_rep
B   1   non_rep
B   1   non_rep
B   1   rep

提前致谢

4 个答案:

答案 0 :(得分:5)

按两列分组,然后将前两个后面的所有值标记为//Check to see that you received request POST | GET if( !empty($_REQUEST )) { // if($name != "" & $email != ""){ // This is wrong //Use not empty as it check for "" | null | false if( !empty( $name ) && !empty( $email ) ) { $result = '<p>Your message has been sent!</p>'; $body = "From: $name\n E-mail: $emial\n Message:\n $message"; mail($to, $ownSubject, $message, $body); header('Location: '.$_SERVER['PHP_SELF']); } var_dump( $result ); }

"rep"

df$output <- ifelse(ave(df$value, df[c("var","value")], FUN=seq_along) > 2, "rep", "non_rep") # var value output #1 A 22 non_rep #2 A 1 non_rep #3 A 1 non_rep #4 A 1 rep #5 A 1 rep #6 B 31 non_rep #7 B 21 non_rep #8 B 1 non_rep #9 B 1 non_rep #10 B 1 rep 翻译可能是:

dplyr

答案 1 :(得分:3)

如果(var, value)对可以多次出现并且需要被视为单独的组,则可以使用data.table的{​​{1}}函数进行分组:

rleid

输出:

var = c("A","A","A","A","A","B","B","B","B","B", "A", "A", "A")
value =c(22,1,1,1,1,31,21,1,1,1, 22, 22, 22)
df = data.frame( var,value)

df$group = data.table::rleid(df$var, df$value)

df %>% 
    group_by(group) %>% 
    mutate(output = ifelse(row_number() > 2, "rep", "non_rep"))

答案 2 :(得分:2)

dplyr解决方案似乎至少对您的示例数据起作用:

library(dplyr)

df %>% 
  group_by(var, value) %>% 
  mutate(output = ifelse(lag(value, n = 2) != value | is.na(lag(value, n = 2)),
                         "non_rep", "rep")) %>%
  ungroup()


# A tibble: 10 x 3
     var value  output
   <chr> <dbl>   <chr>
 1     A    22 non_rep
 2     A     1 non_rep
 3     A     1 non_rep
 4     A     1     rep
 5     A     1     rep
 6     B    31 non_rep
 7     B    21 non_rep
 8     B     1 non_rep
 9     B     1 non_rep
10     B     1     rep

答案 3 :(得分:1)

我们可以使用data.table

library(data.table)
setDT(df)[,  output := if(.N > 2) rep(c("non_rep", "rep"), 
         c(2, .N-2)) else "non_rep" , .(var, value)]
df
#    var value  output
# 1:   A    22 non_rep
# 2:   A     1 non_rep
# 3:   A     1 non_rep
# 4:   A     1     rep
# 5:   A     1     rep
# 6:   B    31 non_rep
# 7:   B    21 non_rep
# 8:   B     1 non_rep
# 9:   B     1 non_rep
#10:   B     1     rep