我有一张这样的表:
df <- data.frame(P1 = c(1,0,0,0,0,0,"A"),
P2 = c(0,-2,1,2,1,0,"A"),
P3 = c(-1,2,0,2,1,0,"B"),
P4 = c(2,0,-1,0,-1,0,"B"),
Names = c("G1","G2","G3","G1","G2","G3","Group"),
stringsAsFactors = FALSE)
哪个成为
Names P1 P2 P3 P4
G1 1 0 -1 2
G2 0 -2 2 0
G3 0 1 0 -1
G1 0 2 2 0
G2 0 1 1 -1
G3 0 0 0 0
Group A A B B
此处,A
和B
是P1, P2, P3, P4
的分组变量。
我想为Ids
(G1
,G2
...),Group
(A
,B
)建立意外事件,以及Var
(-2,-1,0,1,2
)表,例如:
Id Group Var Count
G1 A -2 0
G1 A -1 0
G1 A 0 1
G1 A 1 1
G1 A 2 0
G1 B -2 0
G1 B -1 1
G1 B 0 0
G1 B 1 0
G1 B 2 1
G2 A -2 1
G2 A -1 0
G2 A 0 1
...
有没有办法在没有使用大量循环的情况下在R中执行此操作?
答案 0 :(得分:1)
假设您要对P1
&amp; P2
列为A
和P3
&amp; P4
列为B
,您可以使用data.table
- 包来接近它:
library(data.table)
DT <- melt(melt(setDT(df),
measure.vars = list(c(2,3),c(4,5)),
value.name = c("A","B")),
id = 1, measure.vars = 3:4, variable.name = 'group'
)[order(Id,group)][, val2 := value]
DT[CJ(Id = Id, group = group, value = value, unique = TRUE)
, on = .(Id, group, value)
][, .(counts = sum(!is.na(val2))), by = .(Id, group, value)]
导致:
Id group value counts 1: G1 A -2 0 2: G1 A -1 0 3: G1 A 0 2 4: G1 A 1 1 5: G1 A 2 1 6: G1 B -2 0 7: G1 B -1 1 8: G1 B 0 1 9: G1 B 1 0 10: G1 B 2 2 11: G2 A -2 1 12: G2 A -1 0 13: G2 A 0 2 14: G2 A 1 1 15: G2 A 2 0 16: G2 B -2 0 17: G2 B -1 1 18: G2 B 0 1 19: G2 B 1 1 20: G2 B 2 1 21: G3 A -2 0 22: G3 A -1 0 23: G3 A 0 3 24: G3 A 1 1 25: G3 A 2 0 26: G3 B -2 0 27: G3 B -1 1 28: G3 B 0 3 29: G3 B 1 0 30: G3 B 2 0
使用过的数据:
df <- read.table(text="Id P1 P2 P3 P4
G1 1 0 -1 2
G2 0 -2 2 0
G3 0 1 0 -1
G1 0 2 2 0
G2 0 1 1 -1
G3 0 0 0 0", header=TRUE, stringsAsFactors = FALSE)
请注意,我省略了&#39;组&#39; -row,因为您在评论中说明这些只是为P1
- P4
列指出哪些组属于
答案 1 :(得分:1)
使用
library(tidyverse)
df <- read.table(text="Id P1 P2 P3 P4
G1 1 0 -1 2
G2 0 -2 2 0
G3 0 1 0 -1
G1 0 2 2 0
G2 0 1 1 -1
G3 0 0 0 0", header=TRUE, stringsAsFactors = FALSE)
我们重新整理表格并重新编码P*
中的group
个变量。
然后,我们计算并完成丢失的案例。导致:
df %>%
gather(P1, P2, P3, P4, key = "p", value = "v") %>%
mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>%
group_by(Id, group, v) %>%
summarise(Count = n()) %>%
ungroup() %>%
complete(Id, group, v, fill = list("Count" = 0))
如果你不需要输出中的所有组合,只需使用:
df %>%
gather(P1, P2, P3, P4, key = "p", value = "v") %>%
mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>%
group_by(Id, group, v) %>%
summarise(Count = n())
# A tibble: 17 x 4
# Groups: Id, group [?]
Id group v Count
<chr> <chr> <int> <int>
1 G1 A 0 2
2 G1 A 1 1
3 G1 A 2 1
4 G1 B -1 1
5 G1 B 0 1
6 G1 B 2 2
7 G2 A -2 1
8 G2 A 0 2
9 G2 A 1 1
10 G2 B -1 1
11 G2 B 0 1
12 G2 B 1 1
13 G2 B 2 1
14 G3 A 0 3
15 G3 A 1 1
16 G3 B -1 1
17 G3 B 0 3