我与R合作,我有这些数据:
data <- structure(list(Col1 = 1:9, Col2 = structure(c(2L, 2L, 2L, 1L,
3L, 3L, 3L, 3L, 3L), .Label = c("Administrative ", "National",
"Regional"), class = "factor"), Col3 = structure(c(NA, 3L, 4L,
NA, 2L, 3L, 1L, 4L, 3L), .Label = c("bike", "boat", "car", "truck"
), class = "factor"), Col4 = c(56L, 65L, 58L, 62L, 24L, 25L,
120L, 89L, 468L), X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
X.1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Col1",
"Col2", "Col3", "Col4", "X", "X.1"), class = "data.frame", row.names = c(NA,
-9L))
我想重新安排它以查看可用或不可用的内容。输出看起来像这样:
result <- structure(list(Col1 = c(1L, 4L, 5L), Col2 = structure(c(2L, 1L,
3L), .Label = c("Administrative ", "National", "Regional"), class = "factor"),
car = c(1L, 0L, 1L), truck = c(1L, 0L, 1L), boat = c(0L,
0L, 1L), bike = c(0L, 0L, 1L)), .Names = c("Col1", "Col2",
"car", "truck", "boat", "bike"), class = "data.frame", row.names = c(NA,
-3L))
我尝试使用聚合但我仍远未达到结果。帮助将是
t <- aggregate(data$Col2, by=list(data$Col3), c)
欢迎提供帮助!
答案 0 :(得分:4)
我们可以使用dcast
中的data.table
length
作为fun.aggregate
library(data.table)
dcast(setDT(data), Col2~ Col3, length)[, 1:5, with = FALSE]
答案 1 :(得分:3)
这是使用基础R的想法,
#convert to character
data[2:3] <- lapply(data[2:3], as.character)
#get unique elements to tabulate
i1 <- unique(data$Col3)
i1 <- i1[!is.na(i1)]
setNames(data.frame(do.call(rbind, lapply(split(data$Col3, data$Col2), function(i)
as.integer(match(i1, i, nomatch = 0) > 0)))), i1)
给出,
car truck boat bike Administrative 0 0 0 0 National 1 1 0 0 Regional 1 1 1 1
答案 2 :(得分:2)
如果你感兴趣,这是一个dplyr解决方案,虽然akrun的解决方案看起来更简洁:
library(tidyverse)
result <- data %>%
group_by(Col2, Col3) %>%
summarise(tot = sum(Col4)) %>%
mutate(bool = if_else(tot > 0, 1, 0)) %>%
select(Col2, Col3, bool) %>%
spread(key = Col3, value = bool, fill = 0) %>%
select(-`<NA>`)
答案 3 :(得分:1)
这是使用table
和一些强制的另一种基本R方法。
(table(data$Col2, data$Col3) > 0) + 0L
bike boat car truck
Administrative 0 0 0 0
National 0 0 1 1
Regional 1 1 1 1
table
计算实例,为NA返回0。然后,我们强制使用> 0
来逻辑删除大于1的值,然后返回到+ 0L
的整数。