我想使用dplyr
来确定数据框中的哪些观察结果满足以下条件:
Group
中,Var2
的观察总和为Var1 == good
,其中Var1 == bad
大于library(dplyr)
set.seed(seed = 10)
df <- data.frame("Id" = 1:12,
"Group" = paste(sapply(toupper(letters[1:3]), rep, times = 4,simplify = T)),
"Var1" = sample(rep(c("good","bad"),times = 1000),size = 12),
"Var2" = sample(rep(1:10, times = 1000),size = 12))
print(df)
Id Group Var1 Var2
1 1 A good 6
2 2 A bad 9
3 3 A good 10
4 4 A good 7
5 5 B bad 9
6 6 B bad 1
7 7 B bad 6
8 8 B good 6
9 9 C good 1
10 10 C bad 8
11 11 C good 4
12 12 C bad 2
这是玩具数据框:
group_by()
到目前为止,我已经确定我应该使用summarise()
,filter()
和keepers <- df %>%
group_by(Group, Var1) %>%
summarise(Total = sum(Var2)) %>%
print()
Source: local data frame [6 x 3]
Groups: Group [?]
Group Var1 Total
(chr) (chr) (int)
1 A bad 9
2 A good 23
3 B bad 16
4 B good 6
5 C bad 10
6 C good 5
的某种组合,但我似乎无法绕过一个好方法做到这一点。这是我到目前为止所提出的:
Group
我应该采取哪些后续步骤?归根结底,分析应该返回“A”,因为它是唯一的Total
good
bad
观察值比var table = "<tr><td><input type='hidden' class='hid_id' value='"+id+"' /> "+id+
"</td><td>"+document.getElementById("name_"+id).value+
"</td><td>"+document.getElementById("price_"+id).value+
"</td><td><input type='text' id='qua_"+id+
"' value='1' disabled='disabled' /></td><td><button>more</button></td></tr>";
观察值更大。
答案 0 :(得分:3)
如何使用spread
而不是filter
:
> library(tidyr)
> df %>% group_by(Group, Var1) %>%
+ summarise(Total = sum(Var2)) %>%
+ spread(Var1,Total) %>%
+ filter(good>bad)
Source: local data frame [1 x 3]
Group bad good
1 A 9 23
答案 1 :(得分:2)
与data.table
类似的选项。我们将'data.frame'转换为'data.table'(setDT(df)
),按'Group','Var1'分组,得到'Var2'的sum
,从'long'转换为'wide'并过滤'good'大于'bad'的行。
library(data.table)
dcast(setDT(df)[, sum(Var2) , by = .(Group, Var1)],
Group~Var1, value.var='V1')[good>bad]
# Group bad good
#1: A 9 23