假设给我一个数据框,其中有多个列是因素,而感兴趣的列是colA。
例如,假设数据框如下所示:
colA | colB | colC | colD
--------------------------
1 | 'a' | 1 | 2
1 | 'b' | 2 | 3
4 | 'b' | 2 | 4
2 | 'a' | 3 | 1
3 | 'a' | 2 | 6
3 | 'b' | 1 | 6
我想基于与colA的分组依据来总结每一列,但是以一种方式构造它,以使colB,colC,colD的值分布在行上,而colA的值分布在行上。列。也就是说,当colA值为1时,而colA值为2时,我想要colB值的计数(每个colB值的行),依此类推。 colC和colD相同。产生的数据框将如下所示:
colA_value1 | colA_value2 | colA_value3 | colA_value4
-----------------------------------------------------
colB_a | 1 | 1 | 1 | 0
colB_b | 1 | 0 | 1 | 1
colC_1 | 1 | 0 | 1 | 0
colC_2 | 1 | 0 | 1 | 1
colC_3 | 0 | 1 | 0 | 0
colD_1 | 0 | 1 | 0 | 0
colD_2 | 1 | 0 | 0 | 0
colD_3 | 1 | 0 | 0 | 0
colD_4 | 0 | 0 | 0 | 1
colD_6 | 0 | 0 | 2 | 0
偏好使用tidyverse软件包。
答案 0 :(得分:1)
这可以用很多提迪尔来完成:
library(tidyverse)
df <- data.frame(colA = c(1L, 1L, 4L, 2L, 3L, 3L),
colB = c("a", "b", "b", "a", "a", "b"),
colC = c(1L, 2L, 2L, 3L, 2L, 1L),
colD = c(2L, 3L, 4L, 1L, 6L, 6L))
df %>%
gather(key, value, colA) %>%
unite(colA, key, value) %>%
gather(key, value, -colA) %>%
unite(col, key, value) %>%
count(colA, col) %>%
spread(colA, n, fill = 0)
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
#> # A tibble: 10 x 5
#> col colA_1 colA_2 colA_3 colA_4
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 colB_a 1 1 1 0
#> 2 colB_b 1 0 1 1
#> 3 colC_1 1 0 1 0
#> 4 colC_2 1 0 1 1
#> 5 colC_3 0 1 0 0
#> 6 colD_1 0 1 0 0
#> 7 colD_2 1 0 0 0
#> 8 colD_3 1 0 0 0
#> 9 colD_4 0 0 0 1
#> 10 colD_6 0 0 2 0
答案 1 :(得分:1)
带有reshape2
,melt() + dcast()
library(reshape2)
df <- read.table(header=TRUE, text='colA | colB | colC | colD
1 | a | 1 | 2
1 | b | 2 | 3
4 | b | 2 | 4
2 | a | 3 | 1
3 | a | 2 | 6
3 | b | 1 | 6', sep='|')
df2 <- melt(df, id.vars = 'colA')
df2$value <- trimws(df2$value)
df2$colA <- paste('colA_value', df2$colA, sep='')
df2$variable_value <- paste(df2$variable, df2$value, sep='_')
dcast(df2, variable_value~colA, fun=length)
# variable_value colA_value1 colA_value2 colA_value3 colA_value4
#1 colB_a 1 1 1 0
#2 colB_b 1 0 1 1
#3 colC_1 1 0 1 0
#4 colC_2 1 0 1 1
#5 colC_3 0 1 0 0
#6 colD_1 0 1 0 0
#7 colD_2 1 0 0 0
#8 colD_3 1 0 0 0
#9 colD_4 0 0 0 1
#10 colD_6 0 0 2 0