我有这样的数据
df<- structure(list(`1` = structure(c(3L, 3L, 4L, 3L, 2L, 2L, 3L,
3L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 3L, 3L, 4L,
4L, 4L, 2L), .Label = c("Het", "Het1-Het2", "Homo", "No"), class = "factor"),
`2` = structure(c(4L, 5L, 4L, 5L, 4L, 4L, 4L, 5L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 4L, 5L, 3L, 3L, 1L, 4L, 5L, 5L, 5L, 4L,
2L), .Label = c("Het", "Het1-Het2", "Het2", "Homo", "No"), class = "factor"),
`3` = structure(c(3L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 4L, 3L, 3L, 4L,
2L), .Label = c("Het", "Het1-Het2", "Homo", "No"), class = "factor")), class = "data.frame", row.names = c(NA,
-26L))
我正在尝试查看3列之间的区别 例如第一个中有多少个NO,第二个或第三个中没有。对于het和其他字符串
答案 0 :(得分:1)
我们可以使用table()
函数并按频率排序:
out = data.frame(table(df))
> out[order(out$Freq, decreasing = T), ] # Partial output given
X1 X2 X3 Freq
55 Homo Homo Homo 5
60 No No Homo 5
79 Homo No No 4
9 Het Het2 Het 2
54 Het1-Het2 Homo Homo 2
56 No Homo Homo 2
59 Homo No Homo 2
76 No Homo No 2
1 Het Het Het 1
26 Het1-Het2 Het1-Het2 Het1-Het2 1
2 Het1-Het2 Het Het 0
3 Homo Het Het 0
...
例如,第一行中的Freq
为5表示在Homo
,X1
和X2
中观察到X3
的5种情况
我们可以将第三行中的Freq
解释为4,表示存在X1
为No
,X2
为No
的4种情况X3
是Homo
。
答案 1 :(得分:1)
使用dplyr
,您可以过滤所需的值:
df %>%
filter(`1` == "No",
`2` != "No" & `3` != "No")
1 2 3
1 No Homo Homo
2 No Homo Homo
或
filter(df, `1` == "No", `2` != "No" & `3` != "No")
使用tally
进行计数
df %>%
filter(`1` == "No",
`2` != "No" & `3` != "No") %>%
tally()
n
1 2
当然,@ Luis的解决方案在您进行修改以符合您的条件(即第2列和第3列的&
而不是|
之后)更为简单(在我的书中更优选)。好吧,修改是假设我正确阅读了您的请求:
df[df$`1` == "No" & (df$`2` != "No" & df$`3` != "No"),]
1 2 3
9 No Homo Homo
16 No Homo Homo
sum(df$`1` == "No" & (df$`2` != "No" & df$`3` != "No"))
[1] 2