我在R中有一个大数据集,其中有几个人在一列中的一行中列出了几行。
ID Elevation Year Individual.code
1 Area1 11.0 2009 AA
2 Area1 11.0 2009 AB
3 Area3 79.5 2009 AA
4 Area3 79.5 2009 AC
5 Area3 79.5 2009 AD
6 Area5 57.5 2010 AE
7 Area5 57.5 2010 AB
8 Area7 975.0 2011 AA
9 Area7 975.0 2011 AB
我想通过将“单个代码”拆分为二进制矩阵来创建矩阵,而不会丢失其余的变量,即ID,Elevation和Year
# ID Elevation Year AA AB AC AD AE
#1 Area1 11.0 2009 1 1 0 0 0
#2 Area3 79.5 2009 1 0 1 1 0
#3 Area5 57.5 2010 0 1 0 0 1
#4 Area7 975.0 2011 1 1 0 0 0
答案 0 :(得分:1)
DF <- read.table(text = " ID Elevation Year Individual.code
1 Area1 11.0 2009 AA
2 Area1 11.0 2009 AB
3 Area3 79.5 2009 AA
4 Area3 79.5 2009 AC
5 Area3 79.5 2009 AD
6 Area5 57.5 2010 AE
7 Area5 57.5 2010 AB
8 Area7 975.0 2011 AA
9 Area7 975.0 2011 AB", header = TRUE)
library(reshape2)
dcast(DF, ID + Elevation + Year ~ Individual.code,
fun.aggregate = function(x) as.integer(length(x) > 0))
# ID Elevation Year AA AB AC AD AE
#1 Area1 11.0 2009 1 1 0 0 0
#2 Area3 79.5 2009 1 0 1 1 0
#3 Area5 57.5 2010 0 1 0 0 1
#4 Area7 975.0 2011 1 1 0 0 0
答案 1 :(得分:1)
这是一种方法:
dat <- read.table(text = " ID Elevation Year Individual.code
1 Area1 11.0 2009 AA
2 Area1 11.0 2009 AB
3 Area3 79.5 2009 AA
4 Area3 79.5 2009 AC
5 Area3 79.5 2009 AD
6 Areas 57.5 2010 AE
7 Area5 57.5 2010 AB
8 Area7 975.0 2011 AA
9 Area7 975.0 2011 AB", header = TRUE)
if (!require("pacman")) install.packages("pacman"); library(pacman)
p_load(qdapTools, dplyr)
mtabulate(split(dat[["Individual.code"]], dat[["ID"]])) %>%
matrix2df("ID") %>%
left_join(distinct(select(dat, -Individual.code)), .)
## ID Elevation Year AA AB AC AD AE
## 1 Area1 11.0 2009 1 1 0 0 0
## 2 Area3 79.5 2009 1 0 1 1 0
## 3 Area5 57.5 2010 0 1 0 0 1
## 4 Area7 975.0 2011 1 1 0 0 0
答案 2 :(得分:1)
您可以尝试dplyr/tidyr
library(dplyr)
library(tidyr)
spread(dat, Individual.code, Individual.code) %>%
mutate_each(funs((!is.na(.))+0L), AA:AE)
# ID Elevation Year AA AB AC AD AE
#1 Area1 11.0 2009 1 1 0 0 0
#2 Area3 79.5 2009 1 0 1 1 0
#3 Area5 57.5 2010 0 1 0 0 1
#4 Area7 975.0 2011 1 1 0 0 0
或者您可以使用reshape
base R
res <- reshape(cbind(dat, Col=1), idvar=c('ID', 'Elevation', 'Year'),
timevar='Individual.code', direction='wide')
res[is.na(res)] <- 0