使用dplyr选择每个子组中具有“最接近”值的行

时间:2017-07-21 20:46:05

标签: r group-by dplyr purrr

我正在学习R,我试图找到一种从data.frame

中选择行的方法

我想为每个ID"pre" post的每一行abs(d_days选择一行,"ID" "date" "d_days" "group" "00377698" 2006-11-15 -1006 "pre" "00377698" 2009-08-16 -1 "pre" "00377698" 2009-08-17 0 "ref" "00377698" 2009-08-24 7 "post" "00377698" 2009-09-03 17 "post" "00377698" 2009-10-09 53 "post" "00377698" 2010-02-26 193 "post" "00377698" 2010-08-27 375 "post" "00377698" 2010-11-26 466 "post" "00377698" 2011-08-24 737 "post" "00540688" 2009-06-26 -1664 "pre" "00540688" 2009-08-20 -1609 "pre" "00540688" 2009-11-20 -1517 "pre" "00540688" 2010-11-17 -1155 "pre" "00540688" 2011-12-07 -770 "pre" "00540688" 2014-01-09 -6 "pre" "00540688" 2014-01-15 0 "ref" "00540688" 2014-01-20 5 "post" "00540688" 2014-03-05 49 "post" "00540688" 2015-04-29 469 "post" "00540688" 2015-09-30 623 "post" "00540688" 2016-05-13 849 "post" 之间的差异较小,即预先有类似的间隔-ref,ref-post。最小差异应为1年。

我不是在寻找最大或最小差异,而是寻找最近/最近/最相似的差异> 1年。

我的测试数据框架如下:

data.frame %>% group_by(ID,group) %>% filter (group=="pre"| group=="post" & abs(d_days > 365)) %>% summarise(b = nth(abs(d_days[1]), which.max(abs(d_days[2]))))

我的尝试:

我试过像roll=nearest

这样的想法

我还尝试了"ID" "date" "d_days" "group" "00377698" 2006-11-15 -1006 "pre" "00377698" 2009-08-17 0 "ref" "00377698" 2011-08-24 737 "post" "00540688" 2011-12-07 -770 "pre" "00540688" 2014-01-15 0 "ref" "00540688" 2015-09-30 894 "post" R - merge dataframes on matching A, B and *closest* C?

我也尝试了这个find value closest to x by group in dplyr,但由于我没有寻找接近特定值但只是两个子组中“最接近”的值,所以它没有用完。

不幸的是,我无法得到我想要的东西:

private function encrypt($data, $key) {
    $salt = 'cH!swe!retReGu7W6bEDRup7usuDUh9THeD2CHeGE*ewr4n39=E@rAsp7c-Ph@pH';
    $key = substr(hash('sha256', $salt.$key.$salt), 0, 32);
    $iv_size = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_256, MCRYPT_MODE_ECB);
    $iv = mcrypt_create_iv($iv_size, MCRYPT_RAND);
    $encrypted = base64_encode(mcrypt_encrypt(MCRYPT_RIJNDAEL_256, $key, $data, MCRYPT_MODE_ECB, $iv));
    return $encrypted;
}
private function decrypt($data, $key) {
    $salt = 'cH!swe!retReGu7W6bEDRup7usuDUh9THeD2CHeGE*ewr4n39=E@rAsp7c-Ph@pH';
    $key = substr(hash('sha256', $salt.$key.$salt), 0, 32);
    $iv_size = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_256, MCRYPT_MODE_ECB);
    $iv = mcrypt_create_iv($iv_size, MCRYPT_RAND);
    $decrypted = mcrypt_decrypt(MCRYPT_RIJNDAEL_256, $key, base64_decode($data), MCRYPT_MODE_ECB, $iv);
    $decrypted = rtrim($decrypted, "\0");
    return $decrypted;
}

非常感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

这是我获得接近0的最佳值,这将填补问题的一半。关闭但不完整。

df %>%
  filter(group != "ref",
         abs(d_days) > 365) %>% 
  group_by(ID, group) %>%
  arrange(ID, date) %>%
  filter(abs(d_days - 0) == min(abs(d_days - 0)))

      ID       date d_days  group
   <int>     <fctr>  <int> <fctr>
1 377698 2006-11-15  -1006    pre
2 377698 2010-08-27    375   post
3 540688 2011-12-07   -770    pre
4 540688 2015-04-29    469   post