我有一些重复记录的数据,其中一些不应该在那里(mark
和recov
应该只有band
一次,recap
可以出现几个次)。我想根据列中的某些值(band
)选择唯一的观察结果(variable=="mark"
),并保留"recap"
和"recov"
的其余数据。
我使用了dyplr
,按频段对数据进行分组,然后在列variable=="mark"
时选择唯一记录,这是我的代码:
uniq <- df %>%group_by(band) %>% distinct(variable=="mark")
我发现它运行不正常,在查找某些观察结果时,variable=="recap"
中的其他值已被删除(例如:在band=113749924
中,缺少1993年的回顾值,同样的情况在band=113728509
缺少一个回顾值)
这是一个数据示例:
structure(list(band = c(113728501L, 113728502L, 113728503L, 113728504L,
113728505L, 113728505L, 113728506L, 113728506L, 113728507L, 113728508L,
113728509L, 113728509L, 113728509L, 113728509L, 113728510L, 113728510L,
113729709L, 113729709L, 113729709L, 113729710L, 113729711L, 113729712L,
113729713L, 113729714L, 113729715L, 113729716L, 113729717L, 113729718L,
113729719L, 113729720L, 113729720L, 113729721L, 113729722L, 113729723L,
113729724L, 113729725L, 113729726L, 113729727L, 113729728L, 113729729L,
113729730L, 113729731L, 113729732L, 113729733L, 113729733L, 113729733L,
113729734L, 113729735L, 113729735L, 113729735L, 113729914L, 113729914L,
113729914L, 113729914L, 113729915L, 113729916L, 113729917L, 113729918L,
113729919L, 113729920L, 113729921L, 113729922L, 113729923L, 113729924L,
113729925L, 113729926L, 113729927L, 113729928L, 113729929L, 113749923L,
113749924L, 113749924L, 113749924L), variable = structure(c(1L,
1L, 1L, 1L, 1L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,
3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 1L, 1L, 3L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L), .Label = c("mark", "recap",
"recov"), class = "factor"), year = c(1994L, 1994L, 1994L, 1994L,
1994L, 2012L, 1994L, 1999L, 1994L, 1994L, 1994L, 1994L, 2002L,
2003L, 1994L, 1996L, 1994L, 2002L, 1998L, 1994L, 1994L, 1994L,
1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1995L,
1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L,
1994L, 1994L, 1994L, 1994L, 2002L, 2001L, 1994L, 1994L, 1999L,
1998L, 1994L, 1994L, 1999L, 2005L, 1994L, 1994L, 1994L, 1994L,
1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L,
1994L, 1994L, 1991L, 1991L, 1994L, 1993L)), .Names = c("band",
"variable", "year"), class = "data.frame", row.names = c(NA,
-73L))
最后,我希望得到类似的内容(例如113749924):
band year variable
113749924 1991 mark
113749924 1993 recap
113749924 1994 recov
您能否帮我找一些错误或建议我替代代码?
非常感谢!
答案 0 :(得分:1)
一个选项是- (void)animateTransition: (id<UIViewControllerContextTransitioning>)transitionContext
{
UIViewController* toViewController = [transitionContext viewControllerForKey:UITransitionContextToViewControllerKey];
UIViewController* fromViewController = [transitionContext viewControllerForKey:UITransitionContextFromViewControllerKey];
[[transitionContext containerView] addSubview:toViewController.view];
toViewController.view.alpha = 0;
[UIView animateWithDuration:[self transitionDuration:transitionContext] animations:^{
fromViewController.view.transform = CGAffineTransformMakeScale(0.1, 0.1);
toViewController.view.alpha = 1;
} completion:^(BOOL finished) {
fromViewController.view.transform = CGAffineTransformIdentity;
[transitionContext completeTransition:![transitionContext transitionWasCancelled]];
}];
}
&#39; band&#39;,group_by
行&#39;变量&#39;是&#39;标记&#39;,获取filter
行,然后将其distinct
)与bind_rows
数据集绑定,其中&#39;变量&#39;不是&#39;标记&#39;。
filter
或另一个选项是df %>%
group_by(band) %>%
filter(variable=="mark") %>%
ungroup() %>%
distinct() %>%
bind_rows(., filter(df, variable!="mark")) %>%
arrange(band) %>%
data.frame
band variable year
1 113728501 mark 1994
2 113728502 mark 1994
3 113728503 mark 1994
4 113728504 mark 1994
5 113728505 mark 1994
6 113728505 recov 2012
7 113728506 mark 1994
8 113728506 recap 1999
9 113728507 mark 1994
10 113728508 mark 1994
11 113728509 mark 1994 ###only one mark.
12 113728509 recap 2002
13 113728509 recap 2003
14 113728510 mark 1994
15 113728510 recap 1996
16 113729709 mark 1994
17 113729709 recov 2002
18 113729709 recap 1998
19 113729710 mark 1994
20 113729711 mark 1994
21 113729712 mark 1994
22 113729713 mark 1994
23 113729714 mark 1994
24 113729715 mark 1994
25 113729716 mark 1994
26 113729717 mark 1994
27 113729718 mark 1994
28 113729719 mark 1994
29 113729720 mark 1994
30 113729720 recov 1995
31 113729721 mark 1994
32 113729722 mark 1994
33 113729723 mark 1994
34 113729724 mark 1994
35 113729725 mark 1994
36 113729726 mark 1994
37 113729727 mark 1994
38 113729728 mark 1994
39 113729729 mark 1994
40 113729730 mark 1994
41 113729731 mark 1994
42 113729732 mark 1994
43 113729733 mark 1994
44 113729733 recov 2002
45 113729733 recap 2001
46 113729734 mark 1994
47 113729735 mark 1994
48 113729735 recov 1999
49 113729735 recap 1998
50 113729914 mark 1994
51 113729914 recap 1999
52 113729914 recap 2005
53 113729915 mark 1994
54 113729916 mark 1994
55 113729917 mark 1994
56 113729918 mark 1994
57 113729919 mark 1994
58 113729920 mark 1994
59 113729921 mark 1994
60 113729922 mark 1994
61 113729923 mark 1994
62 113729924 mark 1994
63 113729925 mark 1994
64 113729926 mark 1994
65 113729927 mark 1994
66 113729928 mark 1994
67 113729929 mark 1994
68 113749923 mark 1991
69 113749924 mark 1991
70 113749924 recov 1994
71 113749924 recap 1993
&#39; band&#39;和&#39;变量&#39;,然后创建group_by
大于1的逻辑条件和&#39;变量&#39;是&#39;标记&#39;,否定它(row_number()
)和!
行。
filter