如何根据字符串区分几列

时间:2019-03-29 20:13:09

标签: r

我有这样的数据

df<- structure(list(`1` = structure(c(3L, 3L, 4L, 3L, 2L, 2L, 3L, 
3L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 3L, 3L, 4L, 
4L, 4L, 2L), .Label = c("Het", "Het1-Het2", "Homo", "No"), class = "factor"), 
    `2` = structure(c(4L, 5L, 4L, 5L, 4L, 4L, 4L, 5L, 4L, 4L, 
    4L, 5L, 5L, 5L, 5L, 4L, 5L, 3L, 3L, 1L, 4L, 5L, 5L, 5L, 4L, 
    2L), .Label = c("Het", "Het1-Het2", "Het2", "Homo", "No"), class = "factor"), 
    `3` = structure(c(3L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 4L, 3L, 3L, 4L, 
    2L), .Label = c("Het", "Het1-Het2", "Homo", "No"), class = "factor")), class = "data.frame", row.names = c(NA, 
-26L))

我正在尝试查看3列之间的区别 例如第一个中有多少个NO,第二个或第三个中没有。对于het和其他字符串

2 个答案:

答案 0 :(得分:1)

我们可以使用table()函数并按频率排序:

out = data.frame(table(df))
> out[order(out$Freq, decreasing = T), ]  # Partial output given
          X1        X2        X3 Freq
55      Homo      Homo      Homo    5
60        No        No      Homo    5
79      Homo        No        No    4
9        Het      Het2       Het    2
54 Het1-Het2      Homo      Homo    2
56        No      Homo      Homo    2
59      Homo        No      Homo    2
76        No      Homo        No    2
1        Het       Het       Het    1
26 Het1-Het2 Het1-Het2 Het1-Het2    1
2  Het1-Het2       Het       Het    0
3       Homo       Het       Het    0
...

例如,第一行中的Freq为5表示在HomoX1X2中观察到X3的5种情况

我们可以将第三行中的Freq解释为4,表示存在X1NoX2No的4种情况X3Homo

答案 1 :(得分:1)

使用dplyr,您可以过滤所需的值:

df %>%
  filter(`1` == "No",
         `2` != "No" & `3` != "No")
   1    2    3
1 No Homo Homo
2 No Homo Homo

filter(df, `1` == "No", `2` != "No" & `3` != "No")

使用tally进行计数

df %>%
  filter(`1` == "No",
         `2` != "No" & `3` != "No") %>%
  tally()
  n
1 2

当然,@ Luis的解决方案在您进行修改以符合您的条件(即第2列和第3列的&而不是|之后)更为简单(在我的书中更优选)。好吧,修改是假设我正确阅读了您的请求:

df[df$`1` == "No" & (df$`2` != "No" & df$`3` != "No"),]
    1    2    3
9  No Homo Homo
16 No Homo Homo

sum(df$`1` == "No" & (df$`2` != "No" & df$`3` != "No"))
[1] 2