我在R中有一个data.frame,我希望根据两个条件进行分组:首先,行不应该是重复的,其次如果它们是重复的,则只返回b == 1的行。而不是预期的五行,我得到了这个样本df的所有七行返回。原因是什么?
编辑:Sry,循环确实有效。我只是在[i]
忘了df$b
..问题的第二部分,如何优化可以回答;)
a <- c(rep("A", 2), "B", rep("C",2), "D", "E")
b <- c("ws_12","dr_12","ws_12","ws_12","dr_12","ws_12","dr_12")
df <- data.frame(a,b)
result <- data.frame()
for (i in seq_along(df$a)) {
if (duplicated2(df$a)[i] == FALSE) {
result <- rbind(result, df[i,])
} else if (duplicated2(df$a)[i] == TRUE && substring(df$b,1,2)[i] == "ws") {
result <- rbind(result, df[i,])
}
}
我是编程和R的新手,也许我有一些基本的错误。这也可以更简单的方式完成吗?
答案 0 :(得分:0)
默认情况下,否定DECLARE @TBL TABLE (u NVARCHAR(50), d DATETIME, Score DECIMAL(10, 6))
INSERT INTO @TBL
SELECT 'user01' u, '2016.07.08' d, 0.66667 SCORE union all
select 'user01' u, '2016.07.08' d, 0.33333 SCORE union all
select 'user01' u, '2016.07.08' d, -0.5 SCORE union all
select 'user01' u, '2016.07.09' d, 0.33333 SCORE union all
select 'user01' u, '2016.07.09' d, 0.66667 SCORE union all
select 'user01' u, '2016.07.09' d, 1 SCORE union all
select 'user01' u, '2016.07.10' d, 0.66667 SCORE union all
select 'user01' u, '2016.07.10' d, 1 SCORE union all
select 'user01' u, '2016.07.10' d, 0.5 SCORE union all
select 'user02' u, '2016.07.08' d, 0.16667 SCORE union all
select 'user02' u, '2016.07.08' d, -0.14286 SCORE union all
select 'user02' u, '2016.07.08' d, 0.28571 SCORE union all
select 'user02' u, '2016.07.10' d, 0.66667 SCORE union all
select 'user02' u, '2016.07.10' d, 0.57143 SCORE
;
with cte as
(
select u.[user], d.[date]
from (select distinct u as [user] from @TBL) as u
cross join (select distinct d as [date] from @TBL) as d
)
select cte.[USER], cte.[DATE], avg(isnull(raw.SCORE,0))
from cte
left join @TBL as [raw]
on raw.[u] = cte.[user]
and raw.[d] = cte.date
group by cte.[USER], cte.[DATE]
Order by cte.[USER], cte.[DATE];
选择第一个看到的行。因此,为了实现您的结果,我们可以在duplicated
上order
并删除重复项,
b