不同(dplyr)不能很好地工作 - 基于标准的独特观察

时间:2016-05-25 02:49:30

标签: r duplicates unique dplyr

我有一些重复记录的数据,其中一些不应该在那里(markrecov应该只有band一次,recap可以出现几个次)。我想根据列中的某些值(band)选择唯一的观察结果(variable=="mark"),并保留"recap""recov"的其余数据。

我使用了dyplr,按频段对数据进行分组,然后在列variable=="mark"时选择唯一记录,这是我的代码:

uniq <- df %>%group_by(band)  %>% distinct(variable=="mark")

我发现它运行不正常,在查找某些观察结果时,variable=="recap"中的其他值已被删除(例如:在band=113749924中,缺少1993年的回顾值,同样的情况在band=113728509缺少一个回顾值)

这是一个数据示例:

structure(list(band = c(113728501L, 113728502L, 113728503L, 113728504L, 
113728505L, 113728505L, 113728506L, 113728506L, 113728507L, 113728508L, 
113728509L, 113728509L, 113728509L, 113728509L, 113728510L, 113728510L, 
113729709L, 113729709L, 113729709L, 113729710L, 113729711L, 113729712L, 
113729713L, 113729714L, 113729715L, 113729716L, 113729717L, 113729718L, 
113729719L, 113729720L, 113729720L, 113729721L, 113729722L, 113729723L, 
113729724L, 113729725L, 113729726L, 113729727L, 113729728L, 113729729L, 
113729730L, 113729731L, 113729732L, 113729733L, 113729733L, 113729733L, 
113729734L, 113729735L, 113729735L, 113729735L, 113729914L, 113729914L, 
113729914L, 113729914L, 113729915L, 113729916L, 113729917L, 113729918L, 
113729919L, 113729920L, 113729921L, 113729922L, 113729923L, 113729924L, 
113729925L, 113729926L, 113729927L, 113729928L, 113729929L, 113749923L, 
113749924L, 113749924L, 113749924L), variable = structure(c(1L, 
1L, 1L, 1L, 1L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 
3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 1L, 1L, 3L, 
2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L), .Label = c("mark", "recap", 
"recov"), class = "factor"), year = c(1994L, 1994L, 1994L, 1994L, 
1994L, 2012L, 1994L, 1999L, 1994L, 1994L, 1994L, 1994L, 2002L, 
2003L, 1994L, 1996L, 1994L, 2002L, 1998L, 1994L, 1994L, 1994L, 
1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1995L, 
1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 
1994L, 1994L, 1994L, 1994L, 2002L, 2001L, 1994L, 1994L, 1999L, 
1998L, 1994L, 1994L, 1999L, 2005L, 1994L, 1994L, 1994L, 1994L, 
1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 
1994L, 1994L, 1991L, 1991L, 1994L, 1993L)), .Names = c("band", 
"variable", "year"), class = "data.frame", row.names = c(NA, 
-73L))

最后,我希望得到类似的内容(例如113749924):

band      year variable
113749924 1991 mark
113749924 1993 recap
113749924 1994 recov

您能否帮我找一些错误或建议我替代代码?

非常感谢!

1 个答案:

答案 0 :(得分:1)

一个选项是- (void)animateTransition: (id<UIViewControllerContextTransitioning>)transitionContext { UIViewController* toViewController = [transitionContext viewControllerForKey:UITransitionContextToViewControllerKey]; UIViewController* fromViewController = [transitionContext viewControllerForKey:UITransitionContextFromViewControllerKey]; [[transitionContext containerView] addSubview:toViewController.view]; toViewController.view.alpha = 0; [UIView animateWithDuration:[self transitionDuration:transitionContext] animations:^{ fromViewController.view.transform = CGAffineTransformMakeScale(0.1, 0.1); toViewController.view.alpha = 1; } completion:^(BOOL finished) { fromViewController.view.transform = CGAffineTransformIdentity; [transitionContext completeTransition:![transitionContext transitionWasCancelled]]; }]; } &#39; band&#39;,group_by行&#39;变量&#39;是&#39;标记&#39;,获取filter行,然后将其distinct)与bind_rows数据集绑定,其中&#39;变量&#39;不是&#39;标记&#39;。

filter

或另一个选项是df %>% group_by(band) %>% filter(variable=="mark") %>% ungroup() %>% distinct() %>% bind_rows(., filter(df, variable!="mark")) %>% arrange(band) %>% data.frame band variable year 1 113728501 mark 1994 2 113728502 mark 1994 3 113728503 mark 1994 4 113728504 mark 1994 5 113728505 mark 1994 6 113728505 recov 2012 7 113728506 mark 1994 8 113728506 recap 1999 9 113728507 mark 1994 10 113728508 mark 1994 11 113728509 mark 1994 ###only one mark. 12 113728509 recap 2002 13 113728509 recap 2003 14 113728510 mark 1994 15 113728510 recap 1996 16 113729709 mark 1994 17 113729709 recov 2002 18 113729709 recap 1998 19 113729710 mark 1994 20 113729711 mark 1994 21 113729712 mark 1994 22 113729713 mark 1994 23 113729714 mark 1994 24 113729715 mark 1994 25 113729716 mark 1994 26 113729717 mark 1994 27 113729718 mark 1994 28 113729719 mark 1994 29 113729720 mark 1994 30 113729720 recov 1995 31 113729721 mark 1994 32 113729722 mark 1994 33 113729723 mark 1994 34 113729724 mark 1994 35 113729725 mark 1994 36 113729726 mark 1994 37 113729727 mark 1994 38 113729728 mark 1994 39 113729729 mark 1994 40 113729730 mark 1994 41 113729731 mark 1994 42 113729732 mark 1994 43 113729733 mark 1994 44 113729733 recov 2002 45 113729733 recap 2001 46 113729734 mark 1994 47 113729735 mark 1994 48 113729735 recov 1999 49 113729735 recap 1998 50 113729914 mark 1994 51 113729914 recap 1999 52 113729914 recap 2005 53 113729915 mark 1994 54 113729916 mark 1994 55 113729917 mark 1994 56 113729918 mark 1994 57 113729919 mark 1994 58 113729920 mark 1994 59 113729921 mark 1994 60 113729922 mark 1994 61 113729923 mark 1994 62 113729924 mark 1994 63 113729925 mark 1994 64 113729926 mark 1994 65 113729927 mark 1994 66 113729928 mark 1994 67 113729929 mark 1994 68 113749923 mark 1991 69 113749924 mark 1991 70 113749924 recov 1994 71 113749924 recap 1993 &#39; band&#39;和&#39;变量&#39;,然后创建group_by大于1的逻辑条件和&#39;变量&#39;是&#39;标记&#39;,否定它(row_number())和!行。

filter