我想要一个数据帧输出,其中记录了4个级别的计数2(“是”和“否”)。我可以通过对是或否进行子集和过滤来做到这一点,但我认为必须有一种更好的方法来使用dplyr
std::array
以上是我假设必须做的事情,但不知道如何使散布函数适用于此特定变量。我不介意是否同时包含所有4个级别,那么我可以在事实之后再删几列。
null.ta <- dbdata %>%
filter(MutGroup == "Null") %>%
group_by(ICD_Grouping) %>%
summarise(n()) %>%
spread(???????)
我想要的输出看起来像
structure(list(ICD_Grouping = structure(c(50L, 50L, 33L, 33L,
50L, 50L, 50L, 18L, 21L, 33L, 18L, 18L, 50L, 50L, 50L, 17L, 17L,
17L, 17L, 17L, 17L, 50L, 50L, 50L, 50L, 18L, 18L, 16L, 50L, 50L,
50L, 16L, 17L, 50L, 50L, 50L, 16L, 16L, 30L, 50L, 50L, 16L, 18L,
17L, 50L, 50L, 50L, 50L, 50L, 50L, 21L, 30L, 21L, 18L, 21L, 21L,
13L, 30L, 50L, 50L, 50L, 50L, 13L, 34L, 33L, 18L, 16L, 16L, 16L,
16L, 18L, 10L, 34L, 37L, 34L, 34L, 18L, 33L, 33L, 18L, 18L, 37L,
50L, 30L, 30L, 50L, 50L, 50L, 50L, 50L, 50L, 34L, 34L, 33L, 17L,
14L, 19L, 33L, 18L, 18L, 18L, 50L, 30L, 30L, 30L, 34L, 18L, 18L,
18L, 18L, 30L, 30L, 17L, 17L, 33L), .Label = c("", "C01-2", "C03-6",
"C09-10", "C11", "C15", "C16", "C18-20", "C21", "C22", "C25",
"C30-31", "C33-34", "C37-39", "C40-41", "C43", "C44", "C45",
"C47/49", "C48", "C50", "C51", "C53", "C54-55", "C56", "C57-58",
"C60", "C61", "C62", "C64", "C65-66/68", "C67", "C69", "C70",
"C71", "C72", "C73", "C74-75", "C76.0", "C76.2", "C76.3", "C80",
"C81", "C82-86", "C90.0", "C91.0", "C94.3/95", "D04", "D05",
"D22", "D31", "D33", "D35"), class = "factor"), Immunohistochemistry = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 2L, 2L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L,
2L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 2L, 4L, 2L, 4L, 4L, 4L, 4L, 3L,
3L, 4L), .Label = c("", "N/A", "No", "Yes"), class = "factor")), row.names = c(NA,
-115L), class = "data.frame")
这是随机数据的示例,而不是此数据。就像一个数据框,其中包含通过ICD_Grouping进行的免疫组织化学中每个因子水平的计数。
答案 0 :(得分:0)
如果我理解正确,我们可以使用基本table
来做到这一点:
table(dbdata)
table
将显示每个级别的结果(即使它不再存在于数据中),因此为了使表具有合理的大小,我们使用droplevels
首先删除未使用的级别:
table(droplevels(dbdata))
Immunohistochemistry
ICD_Grouping N/A No Yes
C22 0 0 1
C33-34 0 0 2
C37-39 1 0 0
C43 0 2 7
C44 1 2 8
C45 2 0 17
C47/49 1 0 0
C50 0 1 4
C64 0 0 10
C69 7 0 2
C70 1 0 6
C73 0 1 1
D22 8 0 30
可以使用以下方法将table
转换为具有相同结构的data.frame:
table(droplevels(dbdata)) %>%
as.data.frame.matrix() %>%
tibble::rownames_to_column('ICD_Grouping')
或者如果您喜欢管道:
dbdata %>%
droplevels() %>%
table() %>%
as.data.frame.matrix() %>%
tibble::rownames_to_column('ICD_Grouping')
两者都给出相同的data.frame
:
ICD_Grouping N/A No Yes
1 C22 0 0 1
2 C33-34 0 0 2
3 C37-39 1 0 0
4 C43 0 2 7
5 C44 1 2 8
6 C45 2 0 17
7 C47/49 1 0 0
8 C50 0 1 4
9 C64 0 0 10
10 C69 7 0 2
11 C70 1 0 6
12 C73 0 1 1
13 D22 8 0 30
这种形式是可以在任何下游过程中使用的适当数据帧,或可以与ICD_Grouping
变量结合使用